You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/12/11 14:06:44 UTC

[GitHub] [hive] pvary opened a new pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

pvary opened a new pull request #2865:
URL: https://github.com/apache/hive/pull/2865


   ### What changes were proposed in this pull request?
   Add a full compile retry when there is a CBO issue
   
   ### Why are the changes needed?
   We seen multiple places when we were unable to correctly recreate the AST after a failed CBO. We needed to find a less brittle solution
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r770544264



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##########
@@ -515,14 +516,16 @@ private void preparForCompile(boolean resetTaskIds) throws CommandProcessorExcep
   }
 
   private void prepareContext() throws CommandProcessorException {
+    String originalCboInfo = context != null ? context.cboInfo : null;

Review comment:
       I was thinking to save the cboinfo in the `Hook#afterCompile` when the exception happens and restore it at the next invocation of `Hook#beforeCompile` 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionCBOPlugin.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Re-compiles the query without CBO
+ */
+public class ReExecutionCBOPlugin implements IReExecutionPlugin {

Review comment:
       can we rename this class to align more with the conf name of the option `recompile_without_cbo` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769635930



##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -5536,10 +5536,12 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
 
     HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true,
         "Enable query reexecutions"),
-    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", "overlay,reoptimize,reexecute_lost_am,dagsubmit",
+    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies",
+        "overlay,reoptimize,reexecute_lost_am,dagsubmit,recompile_without_cbo",
         "comma separated list of plugin can be used:\n"
             + "  overlay: hiveconf subtree 'reexec.overlay' is used as an overlay in case of an execution errors out\n"
             + "  reoptimize: collects operator statistics during execution and recompile the query after a failure\n"
+            + "  recompile_without_cbo: recompiles query after a CBO failure\n"
             + "  reexecute_lost_am: reexecutes query if it failed due to tez am node gets decommissioned"),

Review comment:
       Added tests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r768598009



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -190,52 +229,21 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper newPlanMapper = coreDriver.getPlanMapper();
-      if (!explainReOptimization && !shouldReExecuteAfterCompile(oldPlanMapper, newPlanMapper)) {
+      if (!explainReOptimization &&
+          !plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper))) {
         LOG.info("re-running the query would probably not yield better results; returning with last error");
         // FIXME: retain old error; or create a new one?
         return cpr;
       }
     }
   }
 
-  private void afterExecute(PlanMapper planMapper, boolean success) {
-    for (IReExecutionPlugin p : plugins) {
-      p.afterExecute(planMapper, success);
-    }
-  }
-
-  private boolean shouldReExecuteAfterCompile(PlanMapper oldPlanMapper, PlanMapper newPlanMapper) {
-    boolean ret = false;
-    for (IReExecutionPlugin p : plugins) {
-      boolean shouldReExecute = p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper);
-      LOG.debug("{}.shouldReExecuteAfterCompile = {}", p, shouldReExecute);
-      ret |= shouldReExecute;
-    }
-    return ret;
-  }
-
-  private boolean shouldReExecute() {
-    boolean ret = false;
-    for (IReExecutionPlugin p : plugins) {
-      boolean shouldReExecute = p.shouldReExecute(executionIndex);
-      LOG.debug("{}.shouldReExecute = {}", p, shouldReExecute);

Review comment:
       Same as above




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769314666



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionCBOPlugin.java
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.parse.CBOException;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+
+/**
+ * Re-compiles the query without CBO
+ */
+public class ReExecutionCBOPlugin implements IReExecutionPlugin {
+
+  private Driver driver;
+  private boolean retryPossible = false;
+  private CBOFallbackStrategy fallbackStrategy;
+
+  class LocalHook implements QueryLifeTimeHook {
+    @Override
+    public void beforeCompile(QueryLifeTimeHookContext ctx) {
+      // noop
+    }
+
+    @Override
+    public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+      if (hasError) {
+        Throwable throwable = ctx.getHookContext().getException();
+        if (throwable != null) {
+          if (throwable instanceof CBOException) {
+            // Determine if we should re-throw the exception OR if we retry planning with non-CBO.
+            if (fallbackStrategy.isFatal((CBOException) throwable)) {
+              Throwable cause = throwable.getCause();
+              if (cause instanceof RuntimeException || cause instanceof SemanticException) {
+                // These types of exceptions do not need wrapped
+                retryPossible = false;
+                return;
+              }
+              // Wrap all other errors (Should only hit in tests)
+              throw new RuntimeException(cause);
+            } else {
+              // Only if the exception is a CBOException then we can retry
+              retryPossible = true;

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769594565



##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -5561,7 +5563,8 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
         "Size of the runtime statistics cache. Unit is: OperatorStat entry; a query plan consist ~100."),
     HIVE_QUERY_PLANMAPPER_LINK_RELNODES("hive.query.planmapper.link.relnodes", true,
         "Whether to link Calcite nodes to runtime statistics."),
-
+    HIVE_QUERY_MAX_RECOMPILATION_COUNT("hive.query.recompilation.max.count", 1,
+        "Maximum number of re-compilations for a single query."),

Review comment:
       That was only my feeling for symmetry. I could remove, and always do it only once. I am not sure the same does not apply to the `HIVE_QUERY_MAX_REEXECUTION_COUNT`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r767584360



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java
##########
@@ -171,4 +174,24 @@ public static String getUserFromUGI(DriverContext driverContext) throws CommandP
       throw createProcessorException(driverContext, 10, errorMessage, ErrorMsg.findSQLState(e.getMessage()), e);
     }
   }
+
+  public static HookContext getHookContext(DriverContext driverContext, Context context) {

Review comment:
       Good point.
   Created new `PrivateHookContext` constructor

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java
##########
@@ -171,4 +174,24 @@ public static String getUserFromUGI(DriverContext driverContext) throws CommandP
       throw createProcessorException(driverContext, 10, errorMessage, ErrorMsg.findSQLState(e.getMessage()), e);
     }
   }
+
+  public static HookContext getHookContext(DriverContext driverContext, Context context) {
+    String host = "Unknown";
+    try {
+      host = InetAddress.getLocalHost().getHostAddress();
+    } catch (UnknownHostException e) {
+      LOG.warn("Unable to get host", e);

Review comment:
       Ok

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java
##########
@@ -171,4 +174,24 @@ public static String getUserFromUGI(DriverContext driverContext) throws CommandP
       throw createProcessorException(driverContext, 10, errorMessage, ErrorMsg.findSQLState(e.getMessage()), e);
     }
   }
+
+  public static HookContext getHookContext(DriverContext driverContext, Context context) {
+    String host = "Unknown";
+    try {
+      host = InetAddress.getLocalHost().getHostAddress();
+    } catch (UnknownHostException e) {
+      LOG.warn("Unable to get host", e);
+    }
+
+    try {
+      return new PrivateHookContext(driverContext.getPlan(), driverContext.getQueryState(),
+          context.getPathToCS(), SessionState.get().getUserName(), SessionState.get().getUserIpAddress(),
+          host, driverContext.getOperationId(),
+          SessionState.get().getSessionId(), Thread.currentThread().getName(), SessionState.get().isHiveServerQuery(),
+          SessionState.getPerfLogger(), driverContext.getQueryInfo(), context);
+    } catch (Exception e) {
+      LOG.warn("Unable to create hook context");
+      return null;

Review comment:
       Ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r768592824



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -167,20 +201,25 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper oldPlanMapper = coreDriver.getPlanMapper();
-      afterExecute(oldPlanMapper, cpr != null);
+      final boolean success = cpr != null;
+      plugins.forEach(p -> p.afterExecute(oldPlanMapper, success));
+
+      // If the execution was successful return the result
+      if (success) {
+        return cpr;
+      }
 
       boolean shouldReExecute = explainReOptimization && executionIndex==1;
-      shouldReExecute |= cpr == null && shouldReExecute();
+      shouldReExecute |= plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex));
 
-      if (executionIndex >= maxExecutuions || !shouldReExecute) {
-        if (cpr != null) {
-          return cpr;
-        } else {
-          throw cpe;
-        }
+      if (executionIndex >= maxExecutions || !shouldReExecute) {
+        // If we do not have to reexecute, return the last error
+        throw cpe;
       }
+
       LOG.info("Preparing to re-execute query");
-      prepareToReExecute();
+      plugins.forEach(IReExecutionPlugin::prepareToReExecute);
+
       try {
         coreDriver.compileAndRespond(currentQuery);

Review comment:
       Yeah, that's ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r770592513



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -683,20 +681,17 @@ Operator genOPTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticExcept
             // Wrap all other errors (Should only hit in tests)
             throw new SemanticException(e);
           } else {
-            reAnalyzeAST = true;
+            String strategies = conf.getVar(ConfVars.HIVE_QUERY_REEXECUTION_STRATEGIES);
+            if (strategies == null || !Arrays.stream(strategies.split(",")).anyMatch("recompile_without_cbo"::equals)) {
+              throw new SemanticException("Invalid configuration. If fallbackStrategy is set to " + fallbackStrategy.name() +
+                  " then " + ConfVars.HIVE_QUERY_REEXECUTION_STRATEGIES.varname + " should contain 'recompile_without_cbo'");

Review comment:
       Moved the check to `ReExecDriver.checkHookConfig` and throwing an exception whenever the config is not correct




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r767488415



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java
##########
@@ -171,4 +174,24 @@ public static String getUserFromUGI(DriverContext driverContext) throws CommandP
       throw createProcessorException(driverContext, 10, errorMessage, ErrorMsg.findSQLState(e.getMessage()), e);
     }
   }
+
+  public static HookContext getHookContext(DriverContext driverContext, Context context) {

Review comment:
       looks like a contructor to me

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java
##########
@@ -121,19 +121,27 @@ void runBeforeCompileHook(String command) {
   }
 
   /**
-  * Dispatches {@link QueryLifeTimeHook#afterCompile(QueryLifeTimeHookContext, boolean)}.
-  *
-  * @param command the Hive command that is being run
-  * @param compileError true if there was an error while compiling the command, false otherwise
-  */
-  void runAfterCompilationHook(String command, boolean compileError) {
+   * Dispatches {@link QueryLifeTimeHook#afterCompile(QueryLifeTimeHookContext, boolean)}.
+   *
+   * @param driverContext the DriverContext used for generating the HookContext
+   * @param analyzerContext the SemanticAnalyzer context for this query
+   * @param compileException the exception if one was thrown during the compilation
+   */
+  void runAfterCompilationHook(DriverContext driverContext, Context analyzerContext, Throwable compileException) {

Review comment:
       this looks off....why do we have to pass 2 Context objects - its really the case that one of them know about the other?
   

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -190,52 +229,21 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper newPlanMapper = coreDriver.getPlanMapper();
-      if (!explainReOptimization && !shouldReExecuteAfterCompile(oldPlanMapper, newPlanMapper)) {
+      if (!explainReOptimization &&
+          !plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper))) {
         LOG.info("re-running the query would probably not yield better results; returning with last error");
         // FIXME: retain old error; or create a new one?
         return cpr;
       }
     }
   }
 
-  private void afterExecute(PlanMapper planMapper, boolean success) {
-    for (IReExecutionPlugin p : plugins) {
-      p.afterExecute(planMapper, success);
-    }
-  }
-
-  private boolean shouldReExecuteAfterCompile(PlanMapper oldPlanMapper, PlanMapper newPlanMapper) {
-    boolean ret = false;
-    for (IReExecutionPlugin p : plugins) {
-      boolean shouldReExecute = p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper);
-      LOG.debug("{}.shouldReExecuteAfterCompile = {}", p, shouldReExecute);

Review comment:
       undo this change as the old method also provided more details - and is also more debugger friendly

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -190,52 +229,21 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper newPlanMapper = coreDriver.getPlanMapper();
-      if (!explainReOptimization && !shouldReExecuteAfterCompile(oldPlanMapper, newPlanMapper)) {
+      if (!explainReOptimization &&
+          !plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper))) {
         LOG.info("re-running the query would probably not yield better results; returning with last error");
         // FIXME: retain old error; or create a new one?
         return cpr;
       }
     }
   }
 
-  private void afterExecute(PlanMapper planMapper, boolean success) {
-    for (IReExecutionPlugin p : plugins) {
-      p.afterExecute(planMapper, success);
-    }
-  }
-
-  private boolean shouldReExecuteAfterCompile(PlanMapper oldPlanMapper, PlanMapper newPlanMapper) {
-    boolean ret = false;
-    for (IReExecutionPlugin p : plugins) {
-      boolean shouldReExecute = p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper);

Review comment:
       rename `p.shouldReExecute` to `p.shouldReExecuteAfterCompile`

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/TestCBOReCompilation.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
+import org.apache.hadoop.hive.ql.DriverFactory;
+import org.apache.hadoop.hive.ql.IDriver;
+import org.apache.hadoop.hive.ql.processors.CommandProcessorException;
+import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.hive.testutils.HiveTestEnvSetup;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+
+public class TestCBOReCompilation {
+
+  @ClassRule
+  public static HiveTestEnvSetup env_setup = new HiveTestEnvSetup();
+
+  @BeforeClass
+  public static void beforeClass() throws Exception {
+    try (IDriver driver = createDriver()) {
+      dropTables(driver);
+      String[] cmds = {
+          // @formatter:off
+          "create table aa1 ( stf_id string)",
+          "create table bb1 ( stf_id string)",
+          "create table cc1 ( stf_id string)",
+          "create table ff1 ( x string)"
+          // @formatter:on
+      };
+      for (String cmd : cmds) {
+        driver.run(cmd);
+      }
+    }
+  }
+
+  @AfterClass
+  public static void afterClass() throws Exception {
+    try (IDriver driver = createDriver()) {
+      dropTables(driver);
+    }
+  }
+
+  public static void dropTables(IDriver driver) throws Exception {
+    String[] tables = new String[] {"aa1", "bb1", "cc1", "ff1" };
+    for (String t : tables) {
+      driver.run("drop table if exists " + t);
+    }
+  }
+
+  @Test
+  public void testReExecutedOnError() throws Exception {
+    try (IDriver driver = createDriver("ALWAYS")) {
+      String query = "explain from ff1 as a join cc1 as b " +
+          "insert overwrite table aa1 select   stf_id GROUP BY b.stf_id " +
+          "insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id";
+      driver.run(query);
+    }
+  }
+
+  @Test
+  public void testFailOnError() throws Exception {
+    try (IDriver driver = createDriver("TEST")) {
+      String query = "explain from ff1 as a join cc1 as b " +
+          "insert overwrite table aa1 select   stf_id GROUP BY b.stf_id " +
+          "insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id";
+      Assert.assertThrows("Plan not optimized by CBO", CommandProcessorException.class, () -> driver.run(query));

Review comment:
       these are bad tests - because it kinda sets in stone that we should not fix this issue
   
   if we let this in - then you will force who will fix this issue to also invent a new query for this which:
   * fails in cbo
   * doesn not fail without cbo
   
   I don't think that's fair - so a different test is needed
   
   but I understand that this is something which should be handle....right now I don't have any good ideas what we could do instead of this - I'll keep thinking about it...

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -167,20 +201,25 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper oldPlanMapper = coreDriver.getPlanMapper();
-      afterExecute(oldPlanMapper, cpr != null);
+      final boolean success = cpr != null;
+      plugins.forEach(p -> p.afterExecute(oldPlanMapper, success));
+
+      // If the execution was successful return the result
+      if (success) {

Review comment:
       you are missing the reexecutions of `explainReOptimization` 
   

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java
##########
@@ -42,24 +42,72 @@
   /**
    * Called before executing the query.
    */
-  void beforeExecute(int executionIndex, boolean explainReOptimization);
+  default void beforeExecute(int executionIndex, boolean explainReOptimization) {
+    // default noop
+  }
 
   /**
    * The query have failed, does this plugin advises to re-execute it again?
    */
-  boolean shouldReExecute(int executionNum);
+  default boolean shouldReExecute(int executionNum) {

Review comment:
       why do we have 2 `shouldReExecute` methods? 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -167,20 +201,25 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper oldPlanMapper = coreDriver.getPlanMapper();
-      afterExecute(oldPlanMapper, cpr != null);
+      final boolean success = cpr != null;
+      plugins.forEach(p -> p.afterExecute(oldPlanMapper, success));
+
+      // If the execution was successful return the result
+      if (success) {
+        return cpr;
+      }
 
       boolean shouldReExecute = explainReOptimization && executionIndex==1;
-      shouldReExecute |= cpr == null && shouldReExecute();
+      shouldReExecute |= plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex));
 
-      if (executionIndex >= maxExecutuions || !shouldReExecute) {
-        if (cpr != null) {
-          return cpr;
-        } else {
-          throw cpe;
-        }
+      if (executionIndex >= maxExecutions || !shouldReExecute) {
+        // If we do not have to reexecute, return the last error
+        throw cpe;
       }
+
       LOG.info("Preparing to re-execute query");
-      prepareToReExecute();
+      plugins.forEach(IReExecutionPlugin::prepareToReExecute);

Review comment:
       undo these changes; as the old method also provided more details

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -190,52 +229,21 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper newPlanMapper = coreDriver.getPlanMapper();
-      if (!explainReOptimization && !shouldReExecuteAfterCompile(oldPlanMapper, newPlanMapper)) {
+      if (!explainReOptimization &&
+          !plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper))) {
         LOG.info("re-running the query would probably not yield better results; returning with last error");
         // FIXME: retain old error; or create a new one?
         return cpr;
       }
     }
   }
 
-  private void afterExecute(PlanMapper planMapper, boolean success) {
-    for (IReExecutionPlugin p : plugins) {
-      p.afterExecute(planMapper, success);
-    }
-  }
-
-  private boolean shouldReExecuteAfterCompile(PlanMapper oldPlanMapper, PlanMapper newPlanMapper) {
-    boolean ret = false;
-    for (IReExecutionPlugin p : plugins) {
-      boolean shouldReExecute = p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper);
-      LOG.debug("{}.shouldReExecuteAfterCompile = {}", p, shouldReExecute);
-      ret |= shouldReExecute;
-    }
-    return ret;
-  }
-
-  private boolean shouldReExecute() {
-    boolean ret = false;
-    for (IReExecutionPlugin p : plugins) {
-      boolean shouldReExecute = p.shouldReExecute(executionIndex);
-      LOG.debug("{}.shouldReExecute = {}", p, shouldReExecute);

Review comment:
       undo this change as the old method also provided more details

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java
##########
@@ -171,4 +174,24 @@ public static String getUserFromUGI(DriverContext driverContext) throws CommandP
       throw createProcessorException(driverContext, 10, errorMessage, ErrorMsg.findSQLState(e.getMessage()), e);
     }
   }
+
+  public static HookContext getHookContext(DriverContext driverContext, Context context) {
+    String host = "Unknown";
+    try {
+      host = InetAddress.getLocalHost().getHostAddress();
+    } catch (UnknownHostException e) {
+      LOG.warn("Unable to get host", e);

Review comment:
       this is a fatal error - no need to mask it

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java
##########
@@ -171,4 +174,24 @@ public static String getUserFromUGI(DriverContext driverContext) throws CommandP
       throw createProcessorException(driverContext, 10, errorMessage, ErrorMsg.findSQLState(e.getMessage()), e);
     }
   }
+
+  public static HookContext getHookContext(DriverContext driverContext, Context context) {
+    String host = "Unknown";
+    try {
+      host = InetAddress.getLocalHost().getHostAddress();
+    } catch (UnknownHostException e) {
+      LOG.warn("Unable to get host", e);
+    }
+
+    try {
+      return new PrivateHookContext(driverContext.getPlan(), driverContext.getQueryState(),
+          context.getPathToCS(), SessionState.get().getUserName(), SessionState.get().getUserIpAddress(),
+          host, driverContext.getOperationId(),
+          SessionState.get().getSessionId(), Thread.currentThread().getName(), SessionState.get().isHiveServerQuery(),
+          SessionState.getPerfLogger(), driverContext.getQueryInfo(), context);
+    } catch (Exception e) {
+      LOG.warn("Unable to create hook context");
+      return null;

Review comment:
       should be an exception - returning null will just make things worse if that happens

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -115,14 +115,48 @@ public ReExecDriver(QueryState queryState, QueryInfo queryInfo, ArrayList<IReExe
     }
   }
 
+  // I think this should be used only in tests

Review comment:
       I don't really understand why this method was even neccessary; but instead of the comment can we could probably use `@VisibleForTesting`

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionCBOPlugin.java
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.parse.CBOException;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+
+/**
+ * Re-compiles the query without CBO
+ */
+public class ReExecutionCBOPlugin implements IReExecutionPlugin {
+
+  private Driver driver;
+  private boolean retryPossible = false;
+  private CBOFallbackStrategy fallbackStrategy;
+
+  class LocalHook implements QueryLifeTimeHook {
+    @Override
+    public void beforeCompile(QueryLifeTimeHookContext ctx) {
+      // noop
+    }
+
+    @Override
+    public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+      if (hasError) {
+        Throwable throwable = ctx.getHookContext().getException();
+        if (throwable != null) {
+          if (throwable instanceof CBOException) {
+            // Determine if we should re-throw the exception OR if we retry planning with non-CBO.
+            if (fallbackStrategy.isFatal((CBOException) throwable)) {
+              Throwable cause = throwable.getCause();
+              if (cause instanceof RuntimeException || cause instanceof SemanticException) {
+                // These types of exceptions do not need wrapped
+                retryPossible = false;
+                return;
+              }
+              // Wrap all other errors (Should only hit in tests)
+              throw new RuntimeException(cause);
+            } else {
+              // Only if the exception is a CBOException then we can retry
+              retryPossible = true;

Review comment:
       remove all other stuff and retain only the path to this assignment; a log message would probably needed - as you are discarding the `CBOException` and retrying the query.

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -5536,10 +5536,12 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
 
     HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true,
         "Enable query reexecutions"),
-    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", "overlay,reoptimize,reexecute_lost_am,dagsubmit",
+    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies",
+        "overlay,reoptimize,reexecute_lost_am,dagsubmit,reexecute_cbo",
         "comma separated list of plugin can be used:\n"
             + "  overlay: hiveconf subtree 'reexec.overlay' is used as an overlay in case of an execution errors out\n"
             + "  reoptimize: collects operator statistics during execution and recompile the query after a failure\n"
+            + "  reexecute_cbo: reexecutes query after a CBO failure\n"

Review comment:
       I think this name is misleading; I don't have a good name for it but something like:
   * recompile without cbo
   * fallback to non-cbo path
   * non-cbo fallback
   
   

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -167,20 +201,25 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper oldPlanMapper = coreDriver.getPlanMapper();
-      afterExecute(oldPlanMapper, cpr != null);
+      final boolean success = cpr != null;
+      plugins.forEach(p -> p.afterExecute(oldPlanMapper, success));
+
+      // If the execution was successful return the result
+      if (success) {
+        return cpr;
+      }
 
       boolean shouldReExecute = explainReOptimization && executionIndex==1;
-      shouldReExecute |= cpr == null && shouldReExecute();
+      shouldReExecute |= plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex));
 
-      if (executionIndex >= maxExecutuions || !shouldReExecute) {
-        if (cpr != null) {
-          return cpr;
-        } else {
-          throw cpe;
-        }
+      if (executionIndex >= maxExecutions || !shouldReExecute) {
+        // If we do not have to reexecute, return the last error
+        throw cpe;
       }
+
       LOG.info("Preparing to re-execute query");
-      prepareToReExecute();
+      plugins.forEach(IReExecutionPlugin::prepareToReExecute);
+
       try {
         coreDriver.compileAndRespond(currentQuery);

Review comment:
       during a re-execution recompile are we ok to call `coreDriver` ? 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionCBOPlugin.java
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.parse.CBOException;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+
+/**
+ * Re-compiles the query without CBO
+ */
+public class ReExecutionCBOPlugin implements IReExecutionPlugin {
+
+  private Driver driver;
+  private boolean retryPossible = false;
+  private CBOFallbackStrategy fallbackStrategy;
+
+  class LocalHook implements QueryLifeTimeHook {
+    @Override
+    public void beforeCompile(QueryLifeTimeHookContext ctx) {
+      // noop
+    }
+
+    @Override
+    public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+      if (hasError) {

Review comment:
       set `retryPossible` to some value here; instead of retaining the value of the previous run

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java
##########
@@ -42,24 +42,72 @@
   /**
    * Called before executing the query.
    */
-  void beforeExecute(int executionIndex, boolean explainReOptimization);
+  default void beforeExecute(int executionIndex, boolean explainReOptimization) {
+    // default noop
+  }
 
   /**
    * The query have failed, does this plugin advises to re-execute it again?
    */
-  boolean shouldReExecute(int executionNum);
+  default boolean shouldReExecute(int executionNum) {
+    // default no
+    return false;
+  }
 
   /**
-   * The plugin should prepare for the re-compilaton of the query.
+   * The plugin should prepare for the re-compilation of the query.
    */
-  void prepareToReExecute();
+  default void prepareToReExecute() {
+    // default noop
+  }
 
   /**
-   * The query have failed; and have been recompiled - does this plugin advises to re-execute it again?
+   * The query has failed; and have been recompiled - does this plugin advises to re-execute it again?
    */
-  boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, PlanMapper newPlanMapper);
+  default boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, PlanMapper newPlanMapper) {

Review comment:
       returning with `false` at this point and not implementing this method will make an reexec plugin pratcially useless...so it's not an option to not implement it.
   
   please remove these default implementations for methods returning booleans;
   plugin implementors should answer these questions;
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769749700



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##########
@@ -515,14 +516,16 @@ private void preparForCompile(boolean resetTaskIds) throws CommandProcessorExcep
   }
 
   private void prepareContext() throws CommandProcessorException {
+    String originalCboInfo = context != null ? context.cboInfo : null;

Review comment:
       The issue here is that the context we prepared in the `ReCompilePlugin#Hook` will be closed here, and a new one will be recreated:
   ```
       if (context != null && context.getExplainAnalyze() != AnalyzeState.RUNNING) {
         // close the existing ctx etc before compiling a new query, but does not destroy driver
         closeInProcess(false);
       }
   
       if (context == null) {
         context = new Context(driverContext.getConf());
       }
   ```
   
   We have to save the cbo info pushed from the `ReCompilePlugin#Hook`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r768597811



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -190,52 +229,21 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper newPlanMapper = coreDriver.getPlanMapper();
-      if (!explainReOptimization && !shouldReExecuteAfterCompile(oldPlanMapper, newPlanMapper)) {
+      if (!explainReOptimization &&
+          !plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper))) {
         LOG.info("re-running the query would probably not yield better results; returning with last error");
         // FIXME: retain old error; or create a new one?
         return cpr;
       }
     }
   }
 
-  private void afterExecute(PlanMapper planMapper, boolean success) {
-    for (IReExecutionPlugin p : plugins) {
-      p.afterExecute(planMapper, success);
-    }
-  }
-
-  private boolean shouldReExecuteAfterCompile(PlanMapper oldPlanMapper, PlanMapper newPlanMapper) {
-    boolean ret = false;
-    for (IReExecutionPlugin p : plugins) {
-      boolean shouldReExecute = p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper);
-      LOG.debug("{}.shouldReExecuteAfterCompile = {}", p, shouldReExecute);

Review comment:
       TBH, I am unsure here. We can keep:
   - `shouldReExecuteAfterCompile`
   - `shouldReExecute`
   - `shouldReCompile`
   
   Or, we can replace with a stream version:
   ```
   plugins.stream()
                 .peek(p -> LOG.debug("{}.shouldReCompile = {}", p))
                 .anyMatch(p -> p.shouldReCompile(currentIndex))
   ```
   
   Or we can omit the logs, and use only:
   ```
   plugins.stream().anyMatch(p -> p.shouldReCompile(currentIndex))
   ```
   
   Your thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769813074



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionCBOPlugin.java
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+
+/**
+ * Re-compiles the query without CBO
+ */
+public class ReExecutionCBOPlugin implements IReExecutionPlugin {
+
+  private Driver driver;
+  private boolean retryPossible;
+  private String cboMsg;
+
+  class LocalHook implements QueryLifeTimeHook {
+    @Override
+    public void beforeCompile(QueryLifeTimeHookContext ctx) {
+      // noop
+    }
+
+    @Override
+    public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+      if (hasError) {
+        Throwable throwable = ctx.getHookContext().getException();
+        retryPossible = throwable != null && throwable instanceof ReCompileException;

Review comment:
       Agreed. Added.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769846323



##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -5561,7 +5563,8 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
         "Size of the runtime statistics cache. Unit is: OperatorStat entry; a query plan consist ~100."),
     HIVE_QUERY_PLANMAPPER_LINK_RELNODES("hive.query.planmapper.link.relnodes", true,
         "Whether to link Calcite nodes to runtime statistics."),
-
+    HIVE_QUERY_MAX_RECOMPILATION_COUNT("hive.query.recompilation.max.count", 1,
+        "Maximum number of re-compilations for a single query."),

Review comment:
       I tend to avoid adding new toggles unless it is necessary. They require testing and become part of the API that we need to maintain and support.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r768575045



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java
##########
@@ -42,24 +42,72 @@
   /**
    * Called before executing the query.
    */
-  void beforeExecute(int executionIndex, boolean explainReOptimization);
+  default void beforeExecute(int executionIndex, boolean explainReOptimization) {
+    // default noop
+  }
 
   /**
    * The query have failed, does this plugin advises to re-execute it again?
    */
-  boolean shouldReExecute(int executionNum);
+  default boolean shouldReExecute(int executionNum) {

Review comment:
       We discussed, and renamed the other method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r768591706



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -167,20 +201,25 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper oldPlanMapper = coreDriver.getPlanMapper();
-      afterExecute(oldPlanMapper, cpr != null);
+      final boolean success = cpr != null;
+      plugins.forEach(p -> p.afterExecute(oldPlanMapper, success));
+
+      // If the execution was successful return the result
+      if (success) {
+        return cpr;
+      }
 
       boolean shouldReExecute = explainReOptimization && executionIndex==1;
-      shouldReExecute |= cpr == null && shouldReExecute();
+      shouldReExecute |= plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex));
 
-      if (executionIndex >= maxExecutuions || !shouldReExecute) {
-        if (cpr != null) {
-          return cpr;
-        } else {
-          throw cpe;
-        }
+      if (executionIndex >= maxExecutions || !shouldReExecute) {
+        // If we do not have to reexecute, return the last error
+        throw cpe;
       }
+
       LOG.info("Preparing to re-execute query");
-      prepareToReExecute();
+      plugins.forEach(IReExecutionPlugin::prepareToReExecute);

Review comment:
       As discussed, most of them was just a loop, so I would keep this instead of having 6 methods for loops




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r768590360



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -115,14 +115,48 @@ public ReExecDriver(QueryState queryState, QueryInfo queryInfo, ArrayList<IReExe
     }
   }
 
+  // I think this should be used only in tests

Review comment:
       Added the annotation, just wanted to check with you because this leaves out the ReCompile path




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769579986



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -673,8 +672,7 @@ Operator genOPTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticExcept
           }
           this.ctx.setCboInfo(cboMsg);
 
-          // Determine if we should re-throw the exception OR if we try to mark plan as reAnalyzeAST to retry
-          // planning as non-CBO.
+          // Determine if we should re-throw the exception OR if we try to mark the query to retry as non-CBO.

Review comment:
       I am wondering if it would be better to just throw here and let the reexecution plugin deal with what do to afterwards.

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -5561,7 +5563,8 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
         "Size of the runtime statistics cache. Unit is: OperatorStat entry; a query plan consist ~100."),
     HIVE_QUERY_PLANMAPPER_LINK_RELNODES("hive.query.planmapper.link.relnodes", true,
         "Whether to link Calcite nodes to runtime statistics."),
-
+    HIVE_QUERY_MAX_RECOMPILATION_COUNT("hive.query.recompilation.max.count", 1,
+        "Maximum number of re-compilations for a single query."),

Review comment:
       Why do we need the number of recompilations to be configurable?

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -5536,10 +5536,12 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
 
     HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true,
         "Enable query reexecutions"),
-    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", "overlay,reoptimize,reexecute_lost_am,dagsubmit",
+    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies",
+        "overlay,reoptimize,reexecute_lost_am,dagsubmit,recompile_without_cbo",
         "comma separated list of plugin can be used:\n"
             + "  overlay: hiveconf subtree 'reexec.overlay' is used as an overlay in case of an execution errors out\n"
             + "  reoptimize: collects operator statistics during execution and recompile the query after a failure\n"
+            + "  recompile_without_cbo: recompiles query after a CBO failure\n"
             + "  reexecute_lost_am: reexecutes query if it failed due to tez am node gets decommissioned"),

Review comment:
       I didn't go through the whole PR but I get the impression that we could/should combine the `hive.query.reexecution.strategies` and `hive.cbo.fallback.strategy` configurations somehow. Having both does not seem necessary and raises some questions about the expected behavior.
   
   Consider for instance the following:
   ```
   set hive.cbo.fallback.strategy = always;
   -- Note that recompile_without_cbo is missing 
   set hive.query.reexecution.strategies = overlay,reoptimize,reexecute_lost_am,dagsubmit;
   ```
   Reading the current configuration I am not sure what should happen when the CBO fails.
   
   The `hive.cbo.fallback.strategy` has not been released yet so we are free to drop it, or modify it to be consistent with `hive.query.reexecution.strategies`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769864327



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -56,73 +58,92 @@
  * Covers the IDriver interface, handles query re-execution; and asks clear questions from the underlying re-execution plugins.
  */
 public class ReExecDriver implements IDriver {
+  private static final Logger LOG = LoggerFactory.getLogger(ReExecDriver.class);
+  private static final SessionState.LogHelper CONSOLE = new SessionState.LogHelper(LOG);
 
-  private class HandleReOptimizationExplain implements HiveSemanticAnalyzerHook {
-
-    @Override
-    public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext context, ASTNode ast) throws SemanticException {
-      if (ast.getType() == HiveParser.TOK_EXPLAIN) {
-        int childCount = ast.getChildCount();
-        for (int i = 1; i < childCount; i++) {
-          if (ast.getChild(i).getType() == HiveParser.KW_REOPTIMIZATION) {
-            explainReOptimization = true;
-            ast.deleteChild(i);
-            break;
-          }
-        }
-        if (explainReOptimization && firstExecution()) {
-          Tree execTree = ast.getChild(0);
-          execTree.setParent(ast.getParent());
-          ast.getParent().setChild(0, execTree);
-          return (ASTNode) execTree;
-        }
-      }
-      return ast;
-    }
-
-    @Override
-    public void postAnalyze(HiveSemanticAnalyzerHookContext context, List<Task<?>> rootTasks)
-        throws SemanticException {
-    }
-  }
+  private final Driver coreDriver;
+  private final QueryState queryState;
+  private final List<IReExecutionPlugin> plugins;
 
-  private static final Logger LOG = LoggerFactory.getLogger(ReExecDriver.class);
   private boolean explainReOptimization;
-  private Driver coreDriver;
-  private QueryState queryState;
   private String currentQuery;
   private int executionIndex;
 
-  private ArrayList<IReExecutionPlugin> plugins;
+  public ReExecDriver(QueryState queryState, QueryInfo queryInfo, List<IReExecutionPlugin> plugins) {
+    this.queryState = queryState;
+    this.coreDriver = new Driver(queryState, queryInfo, null);
+    this.plugins = plugins;
 
-  @Override
-  public HiveConf getConf() {
-    return queryState.getConf();
+    coreDriver.getHookRunner().addSemanticAnalyzerHook(new HandleReOptimizationExplain());
+    plugins.forEach(p -> p.initialize(coreDriver));
+  }
+
+  @VisibleForTesting
+  public int compile(String command, boolean resetTaskIds) {
+    return coreDriver.compile(command, resetTaskIds);
   }
 
   private boolean firstExecution() {
     return executionIndex == 0;
   }
 
-  public ReExecDriver(QueryState queryState, QueryInfo queryInfo, ArrayList<IReExecutionPlugin> plugins) {
-    this.queryState = queryState;
-    coreDriver = new Driver(queryState, queryInfo, null);
-    coreDriver.getHookRunner().addSemanticAnalyzerHook(new HandleReOptimizationExplain());
-    this.plugins = plugins;
-
-    for (IReExecutionPlugin p : plugins) {
-      p.initialize(coreDriver);
+  private void checkHookConfig() throws CommandProcessorException {
+    String strategies = coreDriver.getConf().getVar(ConfVars.HIVE_QUERY_REEXECUTION_STRATEGIES);
+    CBOFallbackStrategy fallbackStrategy =
+        CBOFallbackStrategy.valueOf(coreDriver.getConf().getVar(ConfVars.HIVE_CBO_FALLBACK_STRATEGY));
+    if (fallbackStrategy.allowsRetry() &&
+        (strategies == null || !Arrays.stream(strategies.split(",")).anyMatch("recompile_without_cbo"::equals))) {
+      String errorMsg = "Invalid configuration. If fallbackStrategy is set to " + fallbackStrategy.name() + " then " +
+          ConfVars.HIVE_QUERY_REEXECUTION_STRATEGIES.varname + " should contain 'recompile_without_cbo'";
+      CONSOLE.printError(errorMsg);
+      throw new CommandProcessorException(errorMsg);

Review comment:
       It might not be necessary to add this check if we change the way these properties interact with each other. For instance, we could drop/replace `hive.cbo.fallback.strategy`  with an alternative that fine tunes the recompilation hook.
   
   This is another element that points that keeping both as they are right now complicates code and configuration by end users.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r767587185



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java
##########
@@ -42,24 +42,72 @@
   /**
    * Called before executing the query.
    */
-  void beforeExecute(int executionIndex, boolean explainReOptimization);
+  default void beforeExecute(int executionIndex, boolean explainReOptimization) {
+    // default noop
+  }
 
   /**
    * The query have failed, does this plugin advises to re-execute it again?
    */
-  boolean shouldReExecute(int executionNum);
+  default boolean shouldReExecute(int executionNum) {
+    // default no
+    return false;
+  }
 
   /**
-   * The plugin should prepare for the re-compilaton of the query.
+   * The plugin should prepare for the re-compilation of the query.
    */
-  void prepareToReExecute();
+  default void prepareToReExecute() {
+    // default noop
+  }
 
   /**
-   * The query have failed; and have been recompiled - does this plugin advises to re-execute it again?
+   * The query has failed; and have been recompiled - does this plugin advises to re-execute it again?
    */
-  boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, PlanMapper newPlanMapper);
+  default boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, PlanMapper newPlanMapper) {

Review comment:
       The CBO ReExecute (ReCompile) plugin does not need reExecute...
   
   Let's talk about this. IMHO default implementation makes the codes of the specific plugins much more readable. No useless code is needed there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r770591622



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -683,20 +681,17 @@ Operator genOPTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticExcept
             // Wrap all other errors (Should only hit in tests)
             throw new SemanticException(e);
           } else {
-            reAnalyzeAST = true;
+            String strategies = conf.getVar(ConfVars.HIVE_QUERY_REEXECUTION_STRATEGIES);

Review comment:
       Moved the check to the `ReExecDriver.checkHookConfig`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r768592941



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -190,52 +229,21 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper newPlanMapper = coreDriver.getPlanMapper();
-      if (!explainReOptimization && !shouldReExecuteAfterCompile(oldPlanMapper, newPlanMapper)) {
+      if (!explainReOptimization &&
+          !plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper))) {
         LOG.info("re-running the query would probably not yield better results; returning with last error");
         // FIXME: retain old error; or create a new one?
         return cpr;
       }
     }
   }
 
-  private void afterExecute(PlanMapper planMapper, boolean success) {
-    for (IReExecutionPlugin p : plugins) {
-      p.afterExecute(planMapper, success);
-    }
-  }
-
-  private boolean shouldReExecuteAfterCompile(PlanMapper oldPlanMapper, PlanMapper newPlanMapper) {
-    boolean ret = false;
-    for (IReExecutionPlugin p : plugins) {
-      boolean shouldReExecute = p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper);

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769314437



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java
##########
@@ -42,24 +42,72 @@
   /**
    * Called before executing the query.
    */
-  void beforeExecute(int executionIndex, boolean explainReOptimization);
+  default void beforeExecute(int executionIndex, boolean explainReOptimization) {
+    // default noop
+  }
 
   /**
    * The query have failed, does this plugin advises to re-execute it again?
    */
-  boolean shouldReExecute(int executionNum);
+  default boolean shouldReExecute(int executionNum) {
+    // default no
+    return false;
+  }
 
   /**
-   * The plugin should prepare for the re-compilaton of the query.
+   * The plugin should prepare for the re-compilation of the query.
    */
-  void prepareToReExecute();
+  default void prepareToReExecute() {
+    // default noop
+  }
 
   /**
-   * The query have failed; and have been recompiled - does this plugin advises to re-execute it again?
+   * The query has failed; and have been recompiled - does this plugin advises to re-execute it again?
    */
-  boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, PlanMapper newPlanMapper);
+  default boolean shouldReExecute(int executionNum, PlanMapper oldPlanMapper, PlanMapper newPlanMapper) {

Review comment:
       Keeping as discussed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769313969



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java
##########
@@ -121,19 +121,27 @@ void runBeforeCompileHook(String command) {
   }
 
   /**
-  * Dispatches {@link QueryLifeTimeHook#afterCompile(QueryLifeTimeHookContext, boolean)}.
-  *
-  * @param command the Hive command that is being run
-  * @param compileError true if there was an error while compiling the command, false otherwise
-  */
-  void runAfterCompilationHook(String command, boolean compileError) {
+   * Dispatches {@link QueryLifeTimeHook#afterCompile(QueryLifeTimeHookContext, boolean)}.
+   *
+   * @param driverContext the DriverContext used for generating the HookContext
+   * @param analyzerContext the SemanticAnalyzer context for this query
+   * @param compileException the exception if one was thrown during the compilation
+   */
+  void runAfterCompilationHook(DriverContext driverContext, Context analyzerContext, Throwable compileException) {

Review comment:
       Keeping as it is based on our discussion




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r771196116



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionCBOPlugin.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Re-compiles the query without CBO
+ */
+public class ReExecutionCBOPlugin implements IReExecutionPlugin {

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769593664



##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -5536,10 +5536,12 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
 
     HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true,
         "Enable query reexecutions"),
-    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", "overlay,reoptimize,reexecute_lost_am,dagsubmit",
+    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies",
+        "overlay,reoptimize,reexecute_lost_am,dagsubmit,recompile_without_cbo",
         "comma separated list of plugin can be used:\n"
             + "  overlay: hiveconf subtree 'reexec.overlay' is used as an overlay in case of an execution errors out\n"
             + "  reoptimize: collects operator statistics during execution and recompile the query after a failure\n"
+            + "  recompile_without_cbo: recompiles query after a CBO failure\n"
             + "  reexecute_lost_am: reexecutes query if it failed due to tez am node gets decommissioned"),

Review comment:
       That's a good point. I think we should check the 2 configurations, and fail if they are not aligned.
   What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769314898



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/TestCBOReCompilation.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
+import org.apache.hadoop.hive.ql.DriverFactory;
+import org.apache.hadoop.hive.ql.IDriver;
+import org.apache.hadoop.hive.ql.processors.CommandProcessorException;
+import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.hive.testutils.HiveTestEnvSetup;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+
+public class TestCBOReCompilation {
+
+  @ClassRule
+  public static HiveTestEnvSetup env_setup = new HiveTestEnvSetup();
+
+  @BeforeClass
+  public static void beforeClass() throws Exception {
+    try (IDriver driver = createDriver()) {
+      dropTables(driver);
+      String[] cmds = {
+          // @formatter:off
+          "create table aa1 ( stf_id string)",
+          "create table bb1 ( stf_id string)",
+          "create table cc1 ( stf_id string)",
+          "create table ff1 ( x string)"
+          // @formatter:on
+      };
+      for (String cmd : cmds) {
+        driver.run(cmd);
+      }
+    }
+  }
+
+  @AfterClass
+  public static void afterClass() throws Exception {
+    try (IDriver driver = createDriver()) {
+      dropTables(driver);
+    }
+  }
+
+  public static void dropTables(IDriver driver) throws Exception {
+    String[] tables = new String[] {"aa1", "bb1", "cc1", "ff1" };
+    for (String t : tables) {
+      driver.run("drop table if exists " + t);
+    }
+  }
+
+  @Test
+  public void testReExecutedOnError() throws Exception {
+    try (IDriver driver = createDriver("ALWAYS")) {
+      String query = "explain from ff1 as a join cc1 as b " +
+          "insert overwrite table aa1 select   stf_id GROUP BY b.stf_id " +
+          "insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id";
+      driver.run(query);
+    }
+  }
+
+  @Test
+  public void testFailOnError() throws Exception {
+    try (IDriver driver = createDriver("TEST")) {
+      String query = "explain from ff1 as a join cc1 as b " +
+          "insert overwrite table aa1 select   stf_id GROUP BY b.stf_id " +
+          "insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id";
+      Assert.assertThrows("Plan not optimized by CBO", CommandProcessorException.class, () -> driver.run(query));

Review comment:
       Found other tests created by @zabetak, so no need to keep these




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769314240



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -167,20 +201,25 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper oldPlanMapper = coreDriver.getPlanMapper();
-      afterExecute(oldPlanMapper, cpr != null);
+      final boolean success = cpr != null;
+      plugins.forEach(p -> p.afterExecute(oldPlanMapper, success));
+
+      // If the execution was successful return the result
+      if (success) {

Review comment:
       Reverted this change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
zabetak commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769857181



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -673,8 +672,7 @@ Operator genOPTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticExcept
           }
           this.ctx.setCboInfo(cboMsg);
 
-          // Determine if we should re-throw the exception OR if we try to mark plan as reAnalyzeAST to retry
-          // planning as non-CBO.
+          // Determine if we should re-throw the exception OR if we try to mark the query to retry as non-CBO.

Review comment:
       Interesting, I was about to comment that `ReCompileException` should better be `CBOException` :) It really feels that this is a better approach. 
   
   If we review this part of the code independently of what happens elsewhere it seems reasonable that we are catching an exception and wrapping it around another more "high-level" abstraction. This is usually a good coding pattern.
   
   Apart from that, I don't know if it is crucial to retain exactly the same outputs in tests. For instance, if the changes you mention are due to the following part of the code:
   ```
               // Wrap all other errors (Should only hit in tests)
               throw new SemanticException(e);
   ```
   then it makes believe even more that we shouldn't care.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r767588134



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionCBOPlugin.java
##########
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.parse.CBOException;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+
+/**
+ * Re-compiles the query without CBO
+ */
+public class ReExecutionCBOPlugin implements IReExecutionPlugin {
+
+  private Driver driver;
+  private boolean retryPossible = false;
+  private CBOFallbackStrategy fallbackStrategy;
+
+  class LocalHook implements QueryLifeTimeHook {
+    @Override
+    public void beforeCompile(QueryLifeTimeHookContext ctx) {
+      // noop
+    }
+
+    @Override
+    public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+      if (hasError) {

Review comment:
       Good point. Thanks! Makes this much less errorprone.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r767580354



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java
##########
@@ -121,19 +121,27 @@ void runBeforeCompileHook(String command) {
   }
 
   /**
-  * Dispatches {@link QueryLifeTimeHook#afterCompile(QueryLifeTimeHookContext, boolean)}.
-  *
-  * @param command the Hive command that is being run
-  * @param compileError true if there was an error while compiling the command, false otherwise
-  */
-  void runAfterCompilationHook(String command, boolean compileError) {
+   * Dispatches {@link QueryLifeTimeHook#afterCompile(QueryLifeTimeHookContext, boolean)}.
+   *
+   * @param driverContext the DriverContext used for generating the HookContext
+   * @param analyzerContext the SemanticAnalyzer context for this query
+   * @param compileException the exception if one was thrown during the compilation
+   */
+  void runAfterCompilationHook(DriverContext driverContext, Context analyzerContext, Throwable compileException) {

Review comment:
       This is the original code from `Executor`:
   ```
         hookContext = new PrivateHookContext(driverContext.getPlan(), driverContext.getQueryState(),
             context.getPathToCS(), SessionState.get().getUserName(), SessionState.get().getUserIpAddress(),
             InetAddress.getLocalHost().getHostAddress(), driverContext.getOperationId(),
             SessionState.get().getSessionId(), Thread.currentThread().getName(), SessionState.get().isHiveServerQuery(),
             SessionState.getPerfLogger(), driverContext.getQueryInfo(), context);
   ```
   
   This uses `context`, and `driverContext`. Any idea how to replace the usage?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r770590909



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##########
@@ -515,14 +516,16 @@ private void preparForCompile(boolean resetTaskIds) throws CommandProcessorExcep
   }
 
   private void prepareContext() throws CommandProcessorException {
+    String originalCboInfo = context != null ? context.cboInfo : null;

Review comment:
       Sadly this happens inside the `coreDriver.compileAndRespond(statement)` so even if we move the change to the `beforeCompile` the context will be recreated in the `prepareContext()`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r771451942



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -673,8 +672,7 @@ Operator genOPTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticExcept
           }
           this.ctx.setCboInfo(cboMsg);
 
-          // Determine if we should re-throw the exception OR if we try to mark plan as reAnalyzeAST to retry
-          // planning as non-CBO.
+          // Determine if we should re-throw the exception OR if we try to mark the query to retry as non-CBO.

Review comment:
       @pvary have explored this in the first patch - and we were getting into changing error message/exception types/etc; because of the tighter co-location of the logic and the recompilation stuff
   
   I suggest to separate those into a new separate changeset and get this in separately




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary merged pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary merged pull request #2865:
URL: https://github.com/apache/hive/pull/2865


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r767584856



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/IReExecutionPlugin.java
##########
@@ -42,24 +42,72 @@
   /**
    * Called before executing the query.
    */
-  void beforeExecute(int executionIndex, boolean explainReOptimization);
+  default void beforeExecute(int executionIndex, boolean explainReOptimization) {
+    // default noop
+  }
 
   /**
    * The query have failed, does this plugin advises to re-execute it again?
    */
-  boolean shouldReExecute(int executionNum);
+  default boolean shouldReExecute(int executionNum) {

Review comment:
       Let's talk about this - I do not know, I have already inherited this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r768571234



##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -5536,10 +5536,12 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
 
     HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true,
         "Enable query reexecutions"),
-    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", "overlay,reoptimize,reexecute_lost_am,dagsubmit",
+    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies",
+        "overlay,reoptimize,reexecute_lost_am,dagsubmit,reexecute_cbo",
         "comma separated list of plugin can be used:\n"
             + "  overlay: hiveconf subtree 'reexec.overlay' is used as an overlay in case of an execution errors out\n"
             + "  reoptimize: collects operator statistics during execution and recompile the query after a failure\n"
+            + "  reexecute_cbo: reexecutes query after a CBO failure\n"

Review comment:
       Renamed to `recompile_without_cbo`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769810983



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java
##########
@@ -156,7 +157,11 @@ public void analyzeInternal(ASTNode ast) throws SemanticException {
           while (driver.getResults(new ArrayList<String>())) {
           }
         } catch (CommandProcessorException e) {
-          throw new SemanticException(e.getMessage(), e);
+          if (e.getCause() instanceof ReCompileException) {

Review comment:
       org.apache.hadoop.hive.cli.split8.TestMiniLlapLocalCliDriver[explainanalyze_2]




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
kgyrtkirk commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769679549



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##########
@@ -515,14 +516,16 @@ private void preparForCompile(boolean resetTaskIds) throws CommandProcessorExcep
   }
 
   private void prepareContext() throws CommandProcessorException {
+    String originalCboInfo = context != null ? context.cboInfo : null;

Review comment:
       I feel that this save/load is a bit out-of-scope here;
   I think that it might be possible to move this into the `ReCompilePlugin#Hook` - what do you think?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -683,20 +681,17 @@ Operator genOPTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticExcept
             // Wrap all other errors (Should only hit in tests)
             throw new SemanticException(e);
           } else {
-            reAnalyzeAST = true;
+            String strategies = conf.getVar(ConfVars.HIVE_QUERY_REEXECUTION_STRATEGIES);

Review comment:
       I think it would be more readable what/why are you doing this if there would be a method like move this into some method like `ReCompilationPlugin#isEnabled(conf)`  or something like that 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java
##########
@@ -156,7 +157,11 @@ public void analyzeInternal(ASTNode ast) throws SemanticException {
           while (driver.getResults(new ArrayList<String>())) {
           }
         } catch (CommandProcessorException e) {
-          throw new SemanticException(e.getMessage(), e);
+          if (e.getCause() instanceof ReCompileException) {

Review comment:
       I think right now we don't have any scenario in which this if would become true; this excepiont will only be thrown when it will be handled

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -683,20 +681,17 @@ Operator genOPTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticExcept
             // Wrap all other errors (Should only hit in tests)
             throw new SemanticException(e);
           } else {
-            reAnalyzeAST = true;
+            String strategies = conf.getVar(ConfVars.HIVE_QUERY_REEXECUTION_STRATEGIES);
+            if (strategies == null || !Arrays.stream(strategies.split(",")).anyMatch("recompile_without_cbo"::equals)) {
+              throw new SemanticException("Invalid configuration. If fallbackStrategy is set to " + fallbackStrategy.name() +
+                  " then " + ConfVars.HIVE_QUERY_REEXECUTION_STRATEGIES.varname + " should contain 'recompile_without_cbo'");

Review comment:
       if this is a sanity check then I think this exception should be thrown all the time regardless we are trying to fall back or not.
   
   Imagine the following:
   almost all queries are handled by CBO; and then comes 1 query and they hit this exception....instead of seeing what the issue is they will start investigating this exception....
   
   you could move this check somewhere into `ReCompilePlugin` - and call that this check when the plugin is not initialized -  I think that would be a better place to throw this exception; or not throw it at all.
   
   ...or we could add the `ReCompilePlugin` based on the actual setting of the `fallbackStrategy`; but in that case this will became a secret contract...(there's always something on the flipside :D)
   ...but automating things is also problematic; because if we add it automatically shouldn't we also throw an exception if the reexecution is disabled alltogether?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecDriver.java
##########
@@ -190,52 +229,21 @@ public CommandProcessorResponse run() throws CommandProcessorException {
       }
 
       PlanMapper newPlanMapper = coreDriver.getPlanMapper();
-      if (!explainReOptimization && !shouldReExecuteAfterCompile(oldPlanMapper, newPlanMapper)) {
+      if (!explainReOptimization &&
+          !plugins.stream().anyMatch(p -> p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper))) {
         LOG.info("re-running the query would probably not yield better results; returning with last error");
         // FIXME: retain old error; or create a new one?
         return cpr;
       }
     }
   }
 
-  private void afterExecute(PlanMapper planMapper, boolean success) {
-    for (IReExecutionPlugin p : plugins) {
-      p.afterExecute(planMapper, success);
-    }
-  }
-
-  private boolean shouldReExecuteAfterCompile(PlanMapper oldPlanMapper, PlanMapper newPlanMapper) {
-    boolean ret = false;
-    for (IReExecutionPlugin p : plugins) {
-      boolean shouldReExecute = p.shouldReExecute(executionIndex, oldPlanMapper, newPlanMapper);
-      LOG.debug("{}.shouldReExecuteAfterCompile = {}", p, shouldReExecute);

Review comment:
       keep the latest changeset

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/reexec/ReExecutionCBOPlugin.java
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.reexec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.Driver;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+
+/**
+ * Re-compiles the query without CBO
+ */
+public class ReExecutionCBOPlugin implements IReExecutionPlugin {
+
+  private Driver driver;
+  private boolean retryPossible;
+  private String cboMsg;
+
+  class LocalHook implements QueryLifeTimeHook {
+    @Override
+    public void beforeCompile(QueryLifeTimeHookContext ctx) {
+      // noop
+    }
+
+    @Override
+    public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+      if (hasError) {
+        Throwable throwable = ctx.getHookContext().getException();
+        retryPossible = throwable != null && throwable instanceof ReCompileException;

Review comment:
       I think it would be good to have log message about this decision




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2865: HIVE-25792: Multi Insert query fails on CBO path

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769598125



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -673,8 +672,7 @@ Operator genOPTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticExcept
           }
           this.ctx.setCboInfo(cboMsg);
 
-          // Determine if we should re-throw the exception OR if we try to mark plan as reAnalyzeAST to retry
-          // planning as non-CBO.
+          // Determine if we should re-throw the exception OR if we try to mark the query to retry as non-CBO.

Review comment:
       That was my first implementation, but that messed up with the current query outputs (qfiles).
   I have tried to throw the ReCompileException (CBOException at that time), and then decide based on the exception. The problem arisen when I had to extract and rethrow the cause of the CBOException if the exception was fatal to provide the same outputs. I have chickened out from that path to keep the existing logic in place.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org