You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/07/16 12:36:12 UTC

[GitHub] [hive] bmaidics opened a new pull request #1263: HIVE-23849: Hive skips the creation of ColumnAccessInfo when creating a view

bmaidics opened a new pull request #1263:
URL: https://github.com/apache/hive/pull/1263


   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #1263: HIVE-23849: Hive skips the creation of ColumnAccessInfo when creating a view

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1263:
URL: https://github.com/apache/hive/pull/1263#discussion_r457959253



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -12566,38 +12565,44 @@ void analyzeInternal(ASTNode ast, Supplier<PlannerContext> pcf) throws SemanticE
       createVwDesc.setTablesUsed(getTablesUsed(pCtx));
     }
 
-    // 6. Generate table access stats if required
-    if (HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_TABLEKEYS)) {
-      TableAccessAnalyzer tableAccessAnalyzer = new TableAccessAnalyzer(pCtx);
-      setTableAccessInfo(tableAccessAnalyzer.analyzeTableAccess());
-    }
+    //If we're creating views and ColumnAccessInfo is already created, we should not run these.
+    if(!forViewCreation ||  getColumnAccessInfo() == null) {

Review comment:
       nit: space




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor merged pull request #1263: HIVE-23849: Hive skips the creation of ColumnAccessInfo when creating a view

Posted by GitBox <gi...@apache.org>.
jcamachor merged pull request #1263:
URL: https://github.com/apache/hive/pull/1263


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1263: HIVE-23849: Hive skips the creation of ColumnAccessInfo when creating a view

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1263:
URL: https://github.com/apache/hive/pull/1263#discussion_r458335606



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -12566,38 +12565,44 @@ void analyzeInternal(ASTNode ast, Supplier<PlannerContext> pcf) throws SemanticE
       createVwDesc.setTablesUsed(getTablesUsed(pCtx));
     }
 
-    // 6. Generate table access stats if required
-    if (HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_TABLEKEYS)) {
-      TableAccessAnalyzer tableAccessAnalyzer = new TableAccessAnalyzer(pCtx);
-      setTableAccessInfo(tableAccessAnalyzer.analyzeTableAccess());
-    }
+    //If we're creating views and ColumnAccessInfo is already created, we should not run these.
+    if(!forViewCreation ||  getColumnAccessInfo() == null) {
+      // 6. Generate table access stats if required
+      if (HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_TABLEKEYS)) {
+        TableAccessAnalyzer tableAccessAnalyzer = new TableAccessAnalyzer(pCtx);
+        setTableAccessInfo(tableAccessAnalyzer.analyzeTableAccess());
+      }
+      AuxOpTreeSignature.linkAuxSignatures(pCtx);
+      // 7. Perform Logical optimization
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("Before logical optimization\n" + Operator.toString(pCtx.getTopOps().values()));
+      }
+      Optimizer optm = new Optimizer();
+      optm.setPctx(pCtx);
+      optm.initialize(conf);
+      pCtx = optm.optimize();
+      if (pCtx.getColumnAccessInfo() != null) {
+        // set ColumnAccessInfo for view column authorization
+        setColumnAccessInfo(pCtx.getColumnAccessInfo());
+      }
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("After logical optimization\n" + Operator.toString(pCtx.getTopOps().values()));
+      }
 
-    AuxOpTreeSignature.linkAuxSignatures(pCtx);
-    // 7. Perform Logical optimization
-    if (LOG.isDebugEnabled()) {
-      LOG.debug("Before logical optimization\n" + Operator.toString(pCtx.getTopOps().values()));
-    }
-    Optimizer optm = new Optimizer();
-    optm.setPctx(pCtx);
-    optm.initialize(conf);
-    pCtx = optm.optimize();
-    if (pCtx.getColumnAccessInfo() != null) {
-      // set ColumnAccessInfo for view column authorization
-      setColumnAccessInfo(pCtx.getColumnAccessInfo());
+      // 8. Generate column access stats if required - wait until column pruning

Review comment:
       Can we add information to the comment above about why we are skipping those specific steps 6-8?

##########
File path: ql/src/test/results/clientpositive/llap/ppd_deterministic_expr.q.out
##########
@@ -198,6 +198,9 @@ PREHOOK: query: create view viewDeterministicUDFA partitioned on (vpart1, vpart2
 where part1 in ('US', 'CA')
 PREHOOK: type: CREATEVIEW
 PREHOOK: Input: default@testa
+PREHOOK: Input: default@testa@part1=CA/part2=ABC/part3=300

Review comment:
       This is a CREATE VIEW. Why are partitions that are not accessed part of the entities accessed? This should not have changed? This happens in all these different tests so the root cause should be the same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] bmaidics commented on a change in pull request #1263: HIVE-23849: Hive skips the creation of ColumnAccessInfo when creating a view

Posted by GitBox <gi...@apache.org>.
bmaidics commented on a change in pull request #1263:
URL: https://github.com/apache/hive/pull/1263#discussion_r458643492



##########
File path: ql/src/test/results/clientpositive/llap/ppd_deterministic_expr.q.out
##########
@@ -198,6 +198,9 @@ PREHOOK: query: create view viewDeterministicUDFA partitioned on (vpart1, vpart2
 where part1 in ('US', 'CA')
 PREHOOK: type: CREATEVIEW
 PREHOOK: Input: default@testa
+PREHOOK: Input: default@testa@part1=CA/part2=ABC/part3=300

Review comment:
       @jcamachor , I think this behavior is expected. Since in this test, CBO is disabled, ColumnAccessInfo will be null, so on view creation, we'll run step 6-8 after my change. Step 7 is the optimizer, and it runs SimpleFetchOptimizer, which will add these partitions to read from them. Is this answers your concern, or maybe I misunderstood your question?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] bmaidics commented on a change in pull request #1263: HIVE-23849: Hive skips the creation of ColumnAccessInfo when creating a view

Posted by GitBox <gi...@apache.org>.
bmaidics commented on a change in pull request #1263:
URL: https://github.com/apache/hive/pull/1263#discussion_r458748457



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -12566,38 +12565,44 @@ void analyzeInternal(ASTNode ast, Supplier<PlannerContext> pcf) throws SemanticE
       createVwDesc.setTablesUsed(getTablesUsed(pCtx));
     }
 
-    // 6. Generate table access stats if required
-    if (HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_TABLEKEYS)) {
-      TableAccessAnalyzer tableAccessAnalyzer = new TableAccessAnalyzer(pCtx);
-      setTableAccessInfo(tableAccessAnalyzer.analyzeTableAccess());
-    }
+    //If we're creating views and ColumnAccessInfo is already created, we should not run these.
+    if(!forViewCreation ||  getColumnAccessInfo() == null) {
+      // 6. Generate table access stats if required
+      if (HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_TABLEKEYS)) {
+        TableAccessAnalyzer tableAccessAnalyzer = new TableAccessAnalyzer(pCtx);
+        setTableAccessInfo(tableAccessAnalyzer.analyzeTableAccess());
+      }
+      AuxOpTreeSignature.linkAuxSignatures(pCtx);
+      // 7. Perform Logical optimization
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("Before logical optimization\n" + Operator.toString(pCtx.getTopOps().values()));
+      }
+      Optimizer optm = new Optimizer();
+      optm.setPctx(pCtx);
+      optm.initialize(conf);
+      pCtx = optm.optimize();
+      if (pCtx.getColumnAccessInfo() != null) {
+        // set ColumnAccessInfo for view column authorization
+        setColumnAccessInfo(pCtx.getColumnAccessInfo());
+      }
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("After logical optimization\n" + Operator.toString(pCtx.getTopOps().values()));
+      }
 
-    AuxOpTreeSignature.linkAuxSignatures(pCtx);
-    // 7. Perform Logical optimization
-    if (LOG.isDebugEnabled()) {
-      LOG.debug("Before logical optimization\n" + Operator.toString(pCtx.getTopOps().values()));
-    }
-    Optimizer optm = new Optimizer();
-    optm.setPctx(pCtx);
-    optm.initialize(conf);
-    pCtx = optm.optimize();
-    if (pCtx.getColumnAccessInfo() != null) {
-      // set ColumnAccessInfo for view column authorization
-      setColumnAccessInfo(pCtx.getColumnAccessInfo());
+      // 8. Generate column access stats if required - wait until column pruning

Review comment:
       Thanks, @jcamachor . Added a more specific comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] bmaidics commented on a change in pull request #1263: HIVE-23849: Hive skips the creation of ColumnAccessInfo when creating a view

Posted by GitBox <gi...@apache.org>.
bmaidics commented on a change in pull request #1263:
URL: https://github.com/apache/hive/pull/1263#discussion_r458640085



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -12566,38 +12565,44 @@ void analyzeInternal(ASTNode ast, Supplier<PlannerContext> pcf) throws SemanticE
       createVwDesc.setTablesUsed(getTablesUsed(pCtx));
     }
 
-    // 6. Generate table access stats if required
-    if (HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_TABLEKEYS)) {
-      TableAccessAnalyzer tableAccessAnalyzer = new TableAccessAnalyzer(pCtx);
-      setTableAccessInfo(tableAccessAnalyzer.analyzeTableAccess());
-    }
+    //If we're creating views and ColumnAccessInfo is already created, we should not run these.
+    if(!forViewCreation ||  getColumnAccessInfo() == null) {

Review comment:
       thanks @pvary 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org