You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/09/05 01:06:51 UTC

[GitHub] [hive] vineetgarg02 opened a new pull request #1472: [DRAFT] Support partition pruning, vectorization and other physical transformations for EXECUTE statement

vineetgarg02 opened a new pull request #1472:
URL: https://github.com/apache/hive/pull/1472


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r491751273



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##########
@@ -253,14 +191,17 @@ public void analyzeInternal(ASTNode root) throws SemanticException {
     String queryName = getQueryName(root);
     if (ss.getPreparePlans().containsKey(queryName)) {
       // retrieve cached plan from session state
-      BaseSemanticAnalyzer cachedPlan = ss.getPreparePlans().get(queryName);
+      SemanticAnalyzer cachedPlan = ss.getPreparePlans().get(queryName);
 
       // make copy of the plan
-      createTaskCopy(cachedPlan);
+      //createTaskCopy(cachedPlan);

Review comment:
       Can remove line commented out.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PrepareStatementAnalyzer.java
##########
@@ -54,6 +58,21 @@ private void savePlan(String queryName) throws SemanticException{
     ss.getPreparePlans().put(queryName, this);
   }
 
+  private <T> T makeCopy(final Object task, Class<T> objClass) {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();

Review comment:
       Can we leave a comment on this method to understand what it is trying to do?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##########
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
       Do we need to keep all these fields for the plan cache in the operator, table, etc.? I am wondering about the implications of keeping them when the operator plan is serialized (i.e., whether that could have an performance impact). @t3rmin4t0r , @rbalamohan , could you comment on this?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##########
@@ -286,6 +227,24 @@ public void analyzeInternal(ASTNode root) throws SemanticException {
       this.acidFileSinks.addAll(cachedPlan.getAcidFileSinks());
       this.initCtx(cachedPlan.getCtx());
       this.ctx.setCboInfo(cachedPlan.getCboInfo());
+      this.setLoadFileWork(cachedPlan.getLoadFileWork());
+      this.setLoadTableWork(cachedPlan.getLoadTableWork());
+
+      this.setQB(cachedPlan.getQB());
+
+      ParseContext pctxt = this.getParseContext();
+      // partition pruner
+      Transform ppr = new PartitionPruner();
+      ppr.transform(pctxt);
+
+      //pctxt.setQueryProperties(this.queryProperties);
+      if (!ctx.getExplainLogical()) {
+        TaskCompiler compiler = TaskCompilerFactory.getCompiler(conf, pctxt);
+        compiler.init(queryState, console, db);
+        compiler.compile(pctxt, rootTasks, inputs, outputs);
+        fetchTask = pctxt.getFetchTask();
+        //fetchTask = makeCopy(cachedPlan.getFetchTask(), cachedPlan.getFetchTask().getClass());

Review comment:
       This comment too.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##########
@@ -286,6 +227,24 @@ public void analyzeInternal(ASTNode root) throws SemanticException {
       this.acidFileSinks.addAll(cachedPlan.getAcidFileSinks());
       this.initCtx(cachedPlan.getCtx());
       this.ctx.setCboInfo(cachedPlan.getCboInfo());
+      this.setLoadFileWork(cachedPlan.getLoadFileWork());
+      this.setLoadTableWork(cachedPlan.getLoadTableWork());
+
+      this.setQB(cachedPlan.getQB());
+
+      ParseContext pctxt = this.getParseContext();
+      // partition pruner
+      Transform ppr = new PartitionPruner();
+      ppr.transform(pctxt);
+
+      //pctxt.setQueryProperties(this.queryProperties);

Review comment:
       Same, can be removed?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -387,6 +387,12 @@
   protected volatile boolean disableJoinMerge = false;
   protected final boolean defaultJoinMerge;
 
+  /*
+   * This is used by prepare/execute statement
+   * Prepare/Execute requires operators to be copied and cached
+   */
+  protected Map<String, TableScanOperator> topOpsCopy = null;

Review comment:
       Why do you need to keep a copy instead of using the original operators? Could you leave a comment on that?

##########
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##########
@@ -84,12 +84,13 @@ Stage-0
                         Select Operator [SEL_40] (rows=1 width=4)
                           Output:["_col0"]
                           TableScan [TS_24] (rows=1 width=4)
-                            Output:["id"]
+                            default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]

Review comment:
       I guess this was actually unrelated to current patch? Probably due to some data structure not being serialized when different union branches are copied?

##########
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##########
@@ -84,12 +84,13 @@ Stage-0
                         Select Operator [SEL_40] (rows=1 width=4)
                           Output:["_col0"]
                           TableScan [TS_24] (rows=1 width=4)
-                            Output:["id"]
+                            default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]
                   <-Map 6 [CONTAINS] vectorized, llap
                     Reduce Output Operator [RS_45]
                       Limit [LIM_44] (rows=1 width=2)
                         Number of rows:1
                         Select Operator [SEL_43] (rows=1 width=0)
                           Output:["_col0"]
                           TableScan [TS_29] (rows=1 width=0)
+                            default@tb2,tb2,Tbl:PARTIAL,Col:COMPLETE

Review comment:
       This seems different than the stats above. Do you know if it is expected? May be worth checking in a follow-up.

##########
File path: ql/src/test/results/clientpositive/llap/prepare_plan.q.out
##########
@@ -170,10 +170,10 @@ STAGE PLANS:
                           sort order: 
                           Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                           value expressions: _col0 (type: bigint)
-            Execution mode: llap
+            Execution mode: vectorized, llap

Review comment:
       Nice!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492539012



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##########
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
       Let's create a follow-up to explore whether some of them may be made transient again and discuss over there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r491751273



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##########
@@ -253,14 +191,17 @@ public void analyzeInternal(ASTNode root) throws SemanticException {
     String queryName = getQueryName(root);
     if (ss.getPreparePlans().containsKey(queryName)) {
       // retrieve cached plan from session state
-      BaseSemanticAnalyzer cachedPlan = ss.getPreparePlans().get(queryName);
+      SemanticAnalyzer cachedPlan = ss.getPreparePlans().get(queryName);
 
       // make copy of the plan
-      createTaskCopy(cachedPlan);
+      //createTaskCopy(cachedPlan);

Review comment:
       Can remove line commented out.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PrepareStatementAnalyzer.java
##########
@@ -54,6 +58,21 @@ private void savePlan(String queryName) throws SemanticException{
     ss.getPreparePlans().put(queryName, this);
   }
 
+  private <T> T makeCopy(final Object task, Class<T> objClass) {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();

Review comment:
       Can we leave a comment on this method to understand what it is trying to do?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##########
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
       Do we need to keep all these fields for the plan cache in the operator, table, etc.? I am wondering about the implications of keeping them when the operator plan is serialized (i.e., whether that could have an performance impact). @t3rmin4t0r , @rbalamohan , could you comment on this?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##########
@@ -286,6 +227,24 @@ public void analyzeInternal(ASTNode root) throws SemanticException {
       this.acidFileSinks.addAll(cachedPlan.getAcidFileSinks());
       this.initCtx(cachedPlan.getCtx());
       this.ctx.setCboInfo(cachedPlan.getCboInfo());
+      this.setLoadFileWork(cachedPlan.getLoadFileWork());
+      this.setLoadTableWork(cachedPlan.getLoadTableWork());
+
+      this.setQB(cachedPlan.getQB());
+
+      ParseContext pctxt = this.getParseContext();
+      // partition pruner
+      Transform ppr = new PartitionPruner();
+      ppr.transform(pctxt);
+
+      //pctxt.setQueryProperties(this.queryProperties);
+      if (!ctx.getExplainLogical()) {
+        TaskCompiler compiler = TaskCompilerFactory.getCompiler(conf, pctxt);
+        compiler.init(queryState, console, db);
+        compiler.compile(pctxt, rootTasks, inputs, outputs);
+        fetchTask = pctxt.getFetchTask();
+        //fetchTask = makeCopy(cachedPlan.getFetchTask(), cachedPlan.getFetchTask().getClass());

Review comment:
       This comment too.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##########
@@ -286,6 +227,24 @@ public void analyzeInternal(ASTNode root) throws SemanticException {
       this.acidFileSinks.addAll(cachedPlan.getAcidFileSinks());
       this.initCtx(cachedPlan.getCtx());
       this.ctx.setCboInfo(cachedPlan.getCboInfo());
+      this.setLoadFileWork(cachedPlan.getLoadFileWork());
+      this.setLoadTableWork(cachedPlan.getLoadTableWork());
+
+      this.setQB(cachedPlan.getQB());
+
+      ParseContext pctxt = this.getParseContext();
+      // partition pruner
+      Transform ppr = new PartitionPruner();
+      ppr.transform(pctxt);
+
+      //pctxt.setQueryProperties(this.queryProperties);

Review comment:
       Same, can be removed?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -387,6 +387,12 @@
   protected volatile boolean disableJoinMerge = false;
   protected final boolean defaultJoinMerge;
 
+  /*
+   * This is used by prepare/execute statement
+   * Prepare/Execute requires operators to be copied and cached
+   */
+  protected Map<String, TableScanOperator> topOpsCopy = null;

Review comment:
       Why do you need to keep a copy instead of using the original operators? Could you leave a comment on that?

##########
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##########
@@ -84,12 +84,13 @@ Stage-0
                         Select Operator [SEL_40] (rows=1 width=4)
                           Output:["_col0"]
                           TableScan [TS_24] (rows=1 width=4)
-                            Output:["id"]
+                            default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]

Review comment:
       I guess this was actually unrelated to current patch? Probably due to some data structure not being serialized when different union branches are copied?

##########
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##########
@@ -84,12 +84,13 @@ Stage-0
                         Select Operator [SEL_40] (rows=1 width=4)
                           Output:["_col0"]
                           TableScan [TS_24] (rows=1 width=4)
-                            Output:["id"]
+                            default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]
                   <-Map 6 [CONTAINS] vectorized, llap
                     Reduce Output Operator [RS_45]
                       Limit [LIM_44] (rows=1 width=2)
                         Number of rows:1
                         Select Operator [SEL_43] (rows=1 width=0)
                           Output:["_col0"]
                           TableScan [TS_29] (rows=1 width=0)
+                            default@tb2,tb2,Tbl:PARTIAL,Col:COMPLETE

Review comment:
       This seems different than the stats above. Do you know if it is expected? May be worth checking in a follow-up.

##########
File path: ql/src/test/results/clientpositive/llap/prepare_plan.q.out
##########
@@ -170,10 +170,10 @@ STAGE PLANS:
                           sort order: 
                           Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                           value expressions: _col0 (type: bigint)
-            Execution mode: llap
+            Execution mode: vectorized, llap

Review comment:
       Nice!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] vineetgarg02 commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
vineetgarg02 commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492240115



##########
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##########
@@ -84,12 +84,13 @@ Stage-0
                         Select Operator [SEL_40] (rows=1 width=4)
                           Output:["_col0"]
                           TableScan [TS_24] (rows=1 width=4)
-                            Output:["id"]
+                            default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]
                   <-Map 6 [CONTAINS] vectorized, llap
                     Reduce Output Operator [RS_45]
                       Limit [LIM_44] (rows=1 width=2)
                         Number of rows:1
                         Select Operator [SEL_43] (rows=1 width=0)
                           Output:["_col0"]
                           TableScan [TS_29] (rows=1 width=0)
+                            default@tb2,tb2,Tbl:PARTIAL,Col:COMPLETE

Review comment:
       I confirmed that this is expected. I compared this plan against master (with explain.user set to false) and there is no difference in the plan.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] vineetgarg02 merged pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
vineetgarg02 merged pull request #1472:
URL: https://github.com/apache/hive/pull/1472


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] vineetgarg02 commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
vineetgarg02 commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492229067



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -387,6 +387,12 @@
   protected volatile boolean disableJoinMerge = false;
   protected final boolean defaultJoinMerge;
 
+  /*
+   * This is used by prepare/execute statement
+   * Prepare/Execute requires operators to be copied and cached
+   */
+  protected Map<String, TableScanOperator> topOpsCopy = null;

Review comment:
       Original operator tree shape is changed when going through physical transformations and task generation (don't know why though), as a result this operator tree can not be used later to regenerate tasks or re-running physical transformations. Therefore we make a copy and cache it after operator tree is generated.
   I will leave a comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] vineetgarg02 commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
vineetgarg02 commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492224561



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##########
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
       I actually tried not keeping these fields but I was running into all sorts of issues like unable to serialize/de-serialize or plan generating without metadata etc. 
   I am not sure if we need to keep all of these fields or we can selectively choose, I went by almost all in interest of time. If Gopal or Rajesh thinks that this may cause performance issue I can open a follow-up to investigate and choose fields selectively.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492539012



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##########
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
       Let's create a follow-up to explore whether some of them may be made transient again and discuss over there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] vineetgarg02 commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
vineetgarg02 commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r493793444



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##########
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
       I have updated HIVE-24005 to investigate this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] vineetgarg02 commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
vineetgarg02 commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492229758



##########
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##########
@@ -84,12 +84,13 @@ Stage-0
                         Select Operator [SEL_40] (rows=1 width=4)
                           Output:["_col0"]
                           TableScan [TS_24] (rows=1 width=4)
-                            Output:["id"]
+                            default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]

Review comment:
       Yeah I think this is likely side effect of some changes in w.r.t serialization/de-serialization. Although this is positive side effect now that we have more information in explain plan.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] vineetgarg02 commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
vineetgarg02 commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r488207028



##########
File path: ql/src/test/queries/clientnegative/prepare_execute_1.q
##########
@@ -1,3 +0,0 @@
---! qt:dataset:src
-prepare query1 from select count(*) from src where key > ? and value < ? group by ?;
-execute query1 using 1,100,1;

Review comment:
       This query no longer fails. I have opened a follow-up to fix this https://issues.apache.org/jira/browse/HIVE-24164




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] vineetgarg02 commented on a change in pull request #1472: HIVE-24009 Support partition pruning, vectorization and other physical transformations for EXECUTE statement

Posted by GitBox <gi...@apache.org>.
vineetgarg02 commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492224561



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##########
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
       I actually tried not keeping these fields but I was running into all sorts of issues like unable to serialize/de-serialize or plan generating without metadata etc. 
   I am not sure if we need to keep all of these fields or we can selectively choose, I went by almost all in interest of time. If Gopal or Rajesh thinks that this may cause performance issue I can open a follow-up to investigate and choose fields selectively.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -387,6 +387,12 @@
   protected volatile boolean disableJoinMerge = false;
   protected final boolean defaultJoinMerge;
 
+  /*
+   * This is used by prepare/execute statement
+   * Prepare/Execute requires operators to be copied and cached
+   */
+  protected Map<String, TableScanOperator> topOpsCopy = null;

Review comment:
       Original operator tree shape is changed when going through physical transformations and task generation (don't know why though), as a result this operator tree can not be used later to regenerate tasks or re-running physical transformations. Therefore we make a copy and cache it after operator tree is generated.
   I will leave a comment.

##########
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##########
@@ -84,12 +84,13 @@ Stage-0
                         Select Operator [SEL_40] (rows=1 width=4)
                           Output:["_col0"]
                           TableScan [TS_24] (rows=1 width=4)
-                            Output:["id"]
+                            default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]

Review comment:
       Yeah I think this is likely side effect of some changes in w.r.t serialization/de-serialization. Although this is positive side effect now that we have more information in explain plan.

##########
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##########
@@ -84,12 +84,13 @@ Stage-0
                         Select Operator [SEL_40] (rows=1 width=4)
                           Output:["_col0"]
                           TableScan [TS_24] (rows=1 width=4)
-                            Output:["id"]
+                            default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]
                   <-Map 6 [CONTAINS] vectorized, llap
                     Reduce Output Operator [RS_45]
                       Limit [LIM_44] (rows=1 width=2)
                         Number of rows:1
                         Select Operator [SEL_43] (rows=1 width=0)
                           Output:["_col0"]
                           TableScan [TS_29] (rows=1 width=0)
+                            default@tb2,tb2,Tbl:PARTIAL,Col:COMPLETE

Review comment:
       I confirmed that this is expected. I compared this plan against master (with explain.user set to false) and there is no difference in the plan.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org