You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by "scarlin-cloudera (via GitHub)" <gi...@apache.org> on 2023/05/31 21:19:59 UTC

[GitHub] [hive] scarlin-cloudera opened a new pull request, #4378: HIVE-27391: Refactor Calcite node generation from lateral views

scarlin-cloudera opened a new pull request, #4378:
URL: https://github.com/apache/hive/pull/4378

   This commit serves as a first step in being able to generate all lateral views in CBO. Currently, only "inline(array())" udtfs are converted into a CBO plan.
   
   The lateral view generation has been extracted from Calcite planner and put into a standalone class with the hope that we can trim down the CalcitePlanner from a 5000 line class to something more manageable.
   
   In this commit, no "q" test output has been changed. There are some CBO statements added within tablevalues.q just to ensure that CBO is processing correctly.
   
   The SemanticAnalyzer changes are small changes to make some utility methods static so they could be accessed from LateralViewPlan. As future cleanup, these static classes should be moved outside of SemanticAnalyzer to thin down the 15000+ line file.
   
   The same is true for CalcitePlanner "gen*RexNode*" classes which are also now static and accessible from LateralViewPlan
   
   A new rule was also created as a precursor to adding CBO support for all lateral views. The only supported lateral view is the "inline(array())" lateral view. The special quality of this UDTF is that the lateral view does not need a lateral view operator. The table is generated through the inline UDTF function and is joined with the base (input) table fields. The initial implementation hacked this to work by placing the base table fields within the "inline(array())" functions so that the top level joined fields are within the array and no join is needed. This commit places this special case within a rule to make this manipulation. This will allow a commit in the near future to avoid special case code in the LateralViewPlan object.
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Relevant information in the commit message
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   This is part of an umbrella Jira HIVE-27390 which will eventually enable CBO support for all lateral views
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Current tests are mostly enough due to the fact that this is only a refactor, but also added "Explain CBO" plans to ensure that the CBO succeeds.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz merged pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.
kasakrisz merged PR #4378:
URL: https://github.com/apache/hive/pull/4378


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4378:
URL: https://github.com/apache/hive/pull/4378#issuecomment-1574985504

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4378)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [13 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "scarlin-cloudera (via GitHub)" <gi...@apache.org>.
scarlin-cloudera commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1222060404


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);
+    RexCall udtfCall = (RexCall) tableFunctionScanRel.getCall();
+    if (!udtfCall.getOperator().getName().equalsIgnoreCase("inline")) {
+      return false;
+    }
+
+    Preconditions.checkState(!udtfCall.getOperands().isEmpty());
+    RexNode operand = udtfCall.getOperands().get(0);
+    if (!(operand instanceof RexCall)) {
+      return false;
+    }
+    RexCall firstOperand = (RexCall) operand;
+    if (!firstOperand.getOperator().getName().equalsIgnoreCase("array")) {
+      return false;
+    }
+    Preconditions.checkState(!firstOperand.getOperands().isEmpty());
+    int numStructParams = firstOperand.getOperands().get(0).getType().getFieldCount();
+
+    if (tableFunctionScanRel.getRowType().getFieldCount() == numStructParams) {
+      return false;
+    }
+
+    return true;
+  }
+
+  public void onMatch(RelOptRuleCall call) {
+    final HiveTableFunctionScan tfs = call.rel(0);
+    RelNode inputRel = tfs.getInput(0);
+    RexCall inlineCall = (RexCall) tfs.getCall();
+    RexCall arrayCall = (RexCall) inlineCall.getOperands().get(0);
+    RelOptCluster cluster = tfs.getCluster();
+
+    List<RexNode> inputRefs = Lists.transform(inputRel.getRowType().getFieldList(),
+        input -> new RexInputRef(input.getIndex(), input.getType()));
+    List<RexNode> newStructExprs = new ArrayList<>();
+
+    for (RexNode currentStructOperand : arrayCall.getOperands()) {
+      List<RexNode> allOperands = new ArrayList<>(inputRefs);
+      RexCall structCall = (RexCall) currentStructOperand;
+      allOperands.addAll(structCall.getOperands());
+      newStructExprs.add(cluster.getRexBuilder().makeCall(structCall.op, allOperands));
+    }
+
+    List<RelDataType> returnTypes = new ArrayList<>(
+        Lists.transform(inputRel.getRowType().getFieldList(), RelDataTypeField::getType));
+    RexCall firstStructCall = (RexCall) arrayCall.getOperands().get(0);
+    returnTypes.addAll(Lists.transform(firstStructCall.getOperands(), RexNode::getType));
+
+    List<RexNode> newArrayCall =
+        Lists.newArrayList(cluster.getRexBuilder().makeCall(arrayCall.op, newStructExprs));

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "scarlin-cloudera (via GitHub)" <gi...@apache.org>.
scarlin-cloudera commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1226644477


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);
+    RexCall udtfCall = (RexCall) tableFunctionScanRel.getCall();
+    if (!udtfCall.getOperator().getName().equalsIgnoreCase("inline")) {

Review Comment:
   I guess what I'm saying is that I don't like having to use the "reverseOperatorMap" to get it.  I guess what i'm saying is that this line within SqlOperatorConverter:
   
     registerFunction("array", SqlStdOperatorTable.ARRAY_VALUE_CONSTRUCTOR, hToken(HiveParser.Identifier, "array"));
     
     ...should have been...
     
      registerFunction(ARRAY_FUNCTION, SqlStdOperatorTable.ARRAY_VALUE_CONSTRUCTOR, hToken(HiveParser.Identifier, ARRAY_FUNCTION));
     
     ...which would have avoided the awkward call to reverseOperatorMap.  I guess I can change it for this one constant and that would be easy enough, but all the other strings for registerFunction don't use constants.  
     
     Ok...I'm talking myself into changing it into a constant though.  We can change other registerFunction calls later?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "scarlin-cloudera (via GitHub)" <gi...@apache.org>.
scarlin-cloudera commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1226679244


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);
+    RexCall udtfCall = (RexCall) tableFunctionScanRel.getCall();
+    if (!udtfCall.getOperator().getName().equalsIgnoreCase("inline")) {

Review Comment:
   Just made the change.  i don't feel so bad about adding it in FunctionRegistry since there is precedence of other constants for function names there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "scarlin-cloudera (via GitHub)" <gi...@apache.org>.
scarlin-cloudera commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1222053381


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);

Review Comment:
    I think I like it this way because Hive should never allow anything other than a RexCall within the HiveTableFunctionScan.  I guess arguably I can get rid of this preconditions check because the very next statement would catch the exception anyway.  Not sure which is the preferred Hive way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4378:
URL: https://github.com/apache/hive/pull/4378#issuecomment-1575393276

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4378)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [13 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4378:
URL: https://github.com/apache/hive/pull/4378#issuecomment-1589852995

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4378)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [13 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4378:
URL: https://github.com/apache/hive/pull/4378#issuecomment-1588249644

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4378)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [13 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.
kasakrisz commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1226127534


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);
+    RexCall udtfCall = (RexCall) tableFunctionScanRel.getCall();
+    if (!udtfCall.getOperator().getName().equalsIgnoreCase("inline")) {

Review Comment:
   How about
   ```
   Objects.equals(firstOperand.getOperator().getName(), SqlFunctionConverter.reverseOperatorMap.get(SqlStdOperatorTable.ARRAY_VALUE_CONSTRUCTOR))
   ```
   ?
   However in this case both `null`s are still considered true which is not ok.
   
   Btw could you please elaborate this? `I would prefer a general constant for "array" within that class`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.
kasakrisz commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1226139074


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java:
##########
@@ -0,0 +1,265 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse.relnodegen;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.optimizer.calcite.TraitsUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+import org.apache.hadoop.hive.ql.optimizer.calcite.translator.TypeConverter;
+import org.apache.hadoop.hive.ql.parse.ASTErrorUtils;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.CalcitePlanner;
+import org.apache.hadoop.hive.ql.parse.HiveParser;
+import org.apache.hadoop.hive.ql.parse.RowResolver;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.UnparseTranslator;
+import org.apache.hadoop.hive.ql.parse.type.FunctionHelper;
+import org.apache.hadoop.hive.ql.parse.type.TypeCheckCtx;
+import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableSet;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LateralViewPlan is a helper class holding the objects needed for generating a Calcite
+ * plan from an ASTNode. The object is to be generated when a LATERAL_VIEW token is detected.
+ * From the ASTNode and relevant input node information, the following objects are created:
+ * * A HiveTableFunctionScan RelNode
+ * * A RowResolver containing the row resolving information output from the RelNode
+ * * The table alias for the table generated by the UDTF and Lateral View
+ */
+public class LateralViewPlan {
+  protected static final Logger LOG = LoggerFactory.getLogger(LateralViewPlan.class.getName());
+
+  // Only acceptable token types under the TOK_LATERAL_VIEW token.
+  public static final ImmutableSet<Integer> TABLE_ALIAS_TOKEN_TYPES =
+      ImmutableSet.of(HiveParser.TOK_SUBQUERY, HiveParser.TOK_TABREF, HiveParser.TOK_PTBLFUNCTION);
+
+  // The RelNode created for this lateral view
+  public final RelNode lateralViewRel;
+
+  // The output RowResolver created for this lateral view.
+  public final RowResolver outputRR;
+
+  // The alias provided for the lateral table in the query
+  public final String lateralTableAlias;
+
+  private final RelOptCluster cluster;
+  private final UnparseTranslator unparseTranslator;
+  private final HiveConf conf;
+  private final FunctionHelper functionHelper;
+
+  public LateralViewPlan(ASTNode lateralView, RelOptCluster cluster, RelNode inputRel,
+      RowResolver inputRR, UnparseTranslator unparseTranslator,
+      HiveConf conf, FunctionHelper functionHelper
+      ) throws SemanticException {
+    // initialize global variables containing helper information
+    this.cluster = cluster;
+    this.unparseTranslator = unparseTranslator;
+    this.conf = conf;
+    this.functionHelper = functionHelper;
+
+    // AST should have form of LATERAL_VIEW -> SELECT -> SELEXPR -> FUNCTION -> function info tree
+    ASTNode selExprAST = (ASTNode) lateralView.getChild(0).getChild(0);
+    ASTNode functionAST = (ASTNode) selExprAST.getChild(0);
+
+    this.lateralTableAlias = getTableAliasFromASTNode(selExprAST);
+
+    // The RexCall for the udtf function (e.g. inline)
+    RexCall udtfCall = getUDTFFunction(functionAST, inputRR);
+
+    // Column aliases provided by the query.
+    List<String> columnAliases = getColumnAliasesFromASTNode(selExprAST, udtfCall);
+
+    this.outputRR = getOutputRR(inputRR, udtfCall, columnAliases, this.lateralTableAlias);
+
+    RelDataType retType = getRetType(cluster, inputRel, udtfCall, columnAliases);
+
+    this.lateralViewRel = HiveTableFunctionScan.create(cluster,
+        TraitsUtil.getDefaultTraitSet(cluster), ImmutableList.of(inputRel), udtfCall,
+        null, retType, null);
+  }
+
+  public static void validateLateralView(ASTNode lateralView) throws SemanticException {
+    if (lateralView.getChildCount() != 2) {
+      throw new SemanticException("Token Lateral View contains " + lateralView.getChildCount() +
+          " children.");
+    }
+    ASTNode next = (ASTNode) lateralView.getChild(1);
+    if (!TABLE_ALIAS_TOKEN_TYPES.contains(next.getToken().getType()) &&
+          HiveParser.TOK_LATERAL_VIEW != next.getToken().getType()) {
+        throw new SemanticException(ASTErrorUtils.getMsg(
+            ErrorMsg.LATERAL_VIEW_INVALID_CHILD.getMsg(), lateralView));
+    }
+  }
+
+  private RexCall getUDTFFunction(ASTNode functionAST, RowResolver inputRR)
+      throws SemanticException {
+
+    String functionName = functionAST.getChild(0).getText().toLowerCase();
+
+    // create the RexNode operands for the UDTF RexCall
+    List<RexNode> operandsForUDTF = getOperandsForUDTF(functionAST, inputRR);
+
+    return this.functionHelper.getUDTFFunction(functionName, operandsForUDTF);
+  }
+
+  private String getTableAliasFromASTNode(ASTNode selExprClause) throws SemanticException {
+    // loop through the AST and find the TOK_TABALIAS object
+    for (Node obj : selExprClause.getChildren()) {
+      ASTNode child = (ASTNode) obj;
+      if (child.getToken().getType() == HiveParser.TOK_TABALIAS) {
+        return BaseSemanticAnalyzer.unescapeIdentifier(child.getChild(0).getText().toLowerCase());
+      }
+    }
+
+    // Parser enforces that table alias is added, but check again
+    throw new SemanticException("Alias should be specified LVJ");
+  }
+
+  private List<String> getColumnAliasesFromASTNode(ASTNode selExprClause,
+      RexCall udtfCall) throws SemanticException {
+    Set<String> uniqueNames = new HashSet<>();
+    List<String> colAliases = new ArrayList<>();
+    for (Node obj : selExprClause.getChildren()) {
+      ASTNode child = (ASTNode) obj;
+      // Skip the token values.  The rest should be the identifier column aliases
+      if (child.getToken().getType() == HiveParser.TOK_TABALIAS ||
+          child.getToken().getType() == HiveParser.TOK_FUNCTION) {
+        continue;
+      }
+      String colAlias = BaseSemanticAnalyzer.unescapeIdentifier(child.getText().toLowerCase());
+      if (uniqueNames.contains(colAlias)) {
+        // Column aliases defined by query for lateral view output are duplicated
+        throw new SemanticException(ErrorMsg.COLUMN_ALIAS_ALREADY_EXISTS.getMsg(colAlias));
+      }
+      uniqueNames.add(colAlias);
+      colAliases.add(colAlias);
+    }
+
+    // if no column aliases were provided, just retrieve them from the return type
+    // of the udtf RexCall
+    if (colAliases.isEmpty()) {
+      colAliases.addAll(
+          Lists.transform(udtfCall.getType().getFieldList(), RelDataTypeField::getName));
+    }
+
+    // Verify that there is an alias for all the columns returned by the udtf call.
+    int udtfFieldCount = udtfCall.getType().getFieldCount();
+    if (colAliases.size() != udtfFieldCount) {
+      // Number of columns in the aliases does not match with number of columns
+      // generated by the lateral view
+      throw new SemanticException(ErrorMsg.UDTF_ALIAS_MISMATCH.getMsg(
+          "expected " + udtfFieldCount + " aliases " + "but got " + colAliases.size()));
+    }
+
+    return colAliases;
+  }
+
+  private List<RexNode> getOperandsForUDTF(ASTNode functionCall,
+      RowResolver inputRR) throws SemanticException {
+    List<RexNode> operands = new ArrayList<>();
+    TypeCheckCtx tcCtx = new TypeCheckCtx(inputRR, this.cluster.getRexBuilder(), false, false);
+    tcCtx.setUnparseTranslator(this.unparseTranslator);
+    // Start at 1 because value 0 is the function name.  Use the CalcitePlanner.genRexNode
+    // to retrieve the RexNode for all the function parameters.
+    for (int i = 1; i < functionCall.getChildren().size(); ++i) {
+      ASTNode functionParam = (ASTNode) functionCall.getChild(i);
+      operands.add(CalcitePlanner.genRexNode(functionParam, inputRR, tcCtx, this.conf));
+    }
+    return operands;
+  }
+
+  private RowResolver getOutputRR(RowResolver inputRR, RexCall udtfCall,
+      List<String> columnAliases, String lateralTableAlias) throws SemanticException {
+
+    RowResolver localOutputRR = new RowResolver();
+
+    // After calling RowResolver, outputRR will be mutated to contain the row resolver
+    // fields
+    if (!RowResolver.add(localOutputRR, inputRR)) {
+      LOG.warn("Duplicates detected when adding columns to RR: see previous message");
+    }
+
+    // The RexNode return value for a udtf is always a struct.
+    TypeInfo typeInfo = TypeConverter.convert(udtfCall.getType());
+    Preconditions.checkState(typeInfo instanceof StructTypeInfo);
+
+    StructTypeInfo typeInfos = (StructTypeInfo) typeInfo;
+    // Match up the column alias with the return value of the udtf and
+    // place in the outputRR
+    for (int i = 0, j = 0; i < columnAliases.size(); i++) {

Review Comment:
   Ah ok, the value of `j` makes unique these internal column names in the RR. Let's keep this as it is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.
kasakrisz commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1217940579


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);
+    RexCall udtfCall = (RexCall) tableFunctionScanRel.getCall();
+    if (!udtfCall.getOperator().getName().equalsIgnoreCase("inline")) {

Review Comment:
   Could you please swap the operands like
   ```
   "inline".equalsIgnoreCase(udtfCall.getOperator().getName())
   ```
   It makes this function call null safe.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);

Review Comment:
   IIUC `Preconditions.checkState` throws exception if the passed boolean expression is false.
   Is it intended? How about returning `false`?



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java:
##########
@@ -448,6 +451,23 @@ public AggregateInfo getWindowAggregateFunctionInfo(boolean isDistinct, boolean
         new AggregateInfo(aggregateParameters, returnType, aggregateName, isDistinct) : null;
   }
 
+  public RexCall getUDTFFunction(String functionName, List<RexNode> operands)
+      throws SemanticException {
+    // Extract the argument types for the operands into a list
+    List<RelDataType> operandTypes = Lists.transform(operands, RexNode::getType);
+
+    FunctionInfo functionInfo = FunctionRegistry.getFunctionInfo(functionName);
+    GenericUDTF genericUDTF = functionInfo.getGenericUDTF();
+    Preconditions.checkNotNull(genericUDTF);

Review Comment:
   This will throw NPE if `genericUDTF` is null. Could you please add an error message.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/FunctionHelper.java:
##########
@@ -78,6 +79,12 @@ AggregateInfo getWindowAggregateFunctionInfo(boolean isDistinct, boolean isAllCo
       String aggregateName, List<RexNode> aggregateParameters)
       throws SemanticException;
 
+  /**
+   * Returns RexCall for UDTF based on given parameters
+   */
+  RexCall getUDTFFunction(String functionName, List<RexNode> operands)

Review Comment:
   nit.: IIUC this method is more than just getting an entry from the `FunctionRegistry` it also converts it to a `RexCall`



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java:
##########
@@ -0,0 +1,265 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse.relnodegen;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.optimizer.calcite.TraitsUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+import org.apache.hadoop.hive.ql.optimizer.calcite.translator.TypeConverter;
+import org.apache.hadoop.hive.ql.parse.ASTErrorUtils;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.CalcitePlanner;
+import org.apache.hadoop.hive.ql.parse.HiveParser;
+import org.apache.hadoop.hive.ql.parse.RowResolver;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.UnparseTranslator;
+import org.apache.hadoop.hive.ql.parse.type.FunctionHelper;
+import org.apache.hadoop.hive.ql.parse.type.TypeCheckCtx;
+import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableSet;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LateralViewPlan is a helper class holding the objects needed for generating a Calcite
+ * plan from an ASTNode. The object is to be generated when a LATERAL_VIEW token is detected.
+ * From the ASTNode and relevant input node information, the following objects are created:
+ * * A HiveTableFunctionScan RelNode
+ * * A RowResolver containing the row resolving information output from the RelNode
+ * * The table alias for the table generated by the UDTF and Lateral View
+ */
+public class LateralViewPlan {
+  protected static final Logger LOG = LoggerFactory.getLogger(LateralViewPlan.class.getName());
+
+  // Only acceptable token types under the TOK_LATERAL_VIEW token.
+  public static final ImmutableSet<Integer> TABLE_ALIAS_TOKEN_TYPES =
+      ImmutableSet.of(HiveParser.TOK_SUBQUERY, HiveParser.TOK_TABREF, HiveParser.TOK_PTBLFUNCTION);
+
+  // The RelNode created for this lateral view
+  public final RelNode lateralViewRel;
+
+  // The output RowResolver created for this lateral view.
+  public final RowResolver outputRR;
+
+  // The alias provided for the lateral table in the query
+  public final String lateralTableAlias;
+
+  private final RelOptCluster cluster;
+  private final UnparseTranslator unparseTranslator;
+  private final HiveConf conf;
+  private final FunctionHelper functionHelper;
+
+  public LateralViewPlan(ASTNode lateralView, RelOptCluster cluster, RelNode inputRel,
+      RowResolver inputRR, UnparseTranslator unparseTranslator,
+      HiveConf conf, FunctionHelper functionHelper
+      ) throws SemanticException {
+    // initialize global variables containing helper information
+    this.cluster = cluster;
+    this.unparseTranslator = unparseTranslator;
+    this.conf = conf;
+    this.functionHelper = functionHelper;
+
+    // AST should have form of LATERAL_VIEW -> SELECT -> SELEXPR -> FUNCTION -> function info tree
+    ASTNode selExprAST = (ASTNode) lateralView.getChild(0).getChild(0);
+    ASTNode functionAST = (ASTNode) selExprAST.getChild(0);
+
+    this.lateralTableAlias = getTableAliasFromASTNode(selExprAST);
+
+    // The RexCall for the udtf function (e.g. inline)
+    RexCall udtfCall = getUDTFFunction(functionAST, inputRR);
+
+    // Column aliases provided by the query.
+    List<String> columnAliases = getColumnAliasesFromASTNode(selExprAST, udtfCall);
+
+    this.outputRR = getOutputRR(inputRR, udtfCall, columnAliases, this.lateralTableAlias);
+
+    RelDataType retType = getRetType(cluster, inputRel, udtfCall, columnAliases);
+
+    this.lateralViewRel = HiveTableFunctionScan.create(cluster,
+        TraitsUtil.getDefaultTraitSet(cluster), ImmutableList.of(inputRel), udtfCall,
+        null, retType, null);
+  }
+
+  public static void validateLateralView(ASTNode lateralView) throws SemanticException {
+    if (lateralView.getChildCount() != 2) {
+      throw new SemanticException("Token Lateral View contains " + lateralView.getChildCount() +
+          " children.");
+    }
+    ASTNode next = (ASTNode) lateralView.getChild(1);
+    if (!TABLE_ALIAS_TOKEN_TYPES.contains(next.getToken().getType()) &&
+          HiveParser.TOK_LATERAL_VIEW != next.getToken().getType()) {
+        throw new SemanticException(ASTErrorUtils.getMsg(
+            ErrorMsg.LATERAL_VIEW_INVALID_CHILD.getMsg(), lateralView));
+    }
+  }
+
+  private RexCall getUDTFFunction(ASTNode functionAST, RowResolver inputRR)
+      throws SemanticException {
+
+    String functionName = functionAST.getChild(0).getText().toLowerCase();
+
+    // create the RexNode operands for the UDTF RexCall
+    List<RexNode> operandsForUDTF = getOperandsForUDTF(functionAST, inputRR);
+
+    return this.functionHelper.getUDTFFunction(functionName, operandsForUDTF);
+  }
+
+  private String getTableAliasFromASTNode(ASTNode selExprClause) throws SemanticException {
+    // loop through the AST and find the TOK_TABALIAS object
+    for (Node obj : selExprClause.getChildren()) {
+      ASTNode child = (ASTNode) obj;
+      if (child.getToken().getType() == HiveParser.TOK_TABALIAS) {
+        return BaseSemanticAnalyzer.unescapeIdentifier(child.getChild(0).getText().toLowerCase());
+      }
+    }
+
+    // Parser enforces that table alias is added, but check again
+    throw new SemanticException("Alias should be specified LVJ");
+  }
+
+  private List<String> getColumnAliasesFromASTNode(ASTNode selExprClause,
+      RexCall udtfCall) throws SemanticException {
+    Set<String> uniqueNames = new HashSet<>();
+    List<String> colAliases = new ArrayList<>();
+    for (Node obj : selExprClause.getChildren()) {
+      ASTNode child = (ASTNode) obj;
+      // Skip the token values.  The rest should be the identifier column aliases
+      if (child.getToken().getType() == HiveParser.TOK_TABALIAS ||
+          child.getToken().getType() == HiveParser.TOK_FUNCTION) {
+        continue;
+      }
+      String colAlias = BaseSemanticAnalyzer.unescapeIdentifier(child.getText().toLowerCase());
+      if (uniqueNames.contains(colAlias)) {
+        // Column aliases defined by query for lateral view output are duplicated
+        throw new SemanticException(ErrorMsg.COLUMN_ALIAS_ALREADY_EXISTS.getMsg(colAlias));
+      }
+      uniqueNames.add(colAlias);
+      colAliases.add(colAlias);
+    }
+
+    // if no column aliases were provided, just retrieve them from the return type
+    // of the udtf RexCall
+    if (colAliases.isEmpty()) {
+      colAliases.addAll(
+          Lists.transform(udtfCall.getType().getFieldList(), RelDataTypeField::getName));
+    }
+
+    // Verify that there is an alias for all the columns returned by the udtf call.
+    int udtfFieldCount = udtfCall.getType().getFieldCount();
+    if (colAliases.size() != udtfFieldCount) {
+      // Number of columns in the aliases does not match with number of columns
+      // generated by the lateral view
+      throw new SemanticException(ErrorMsg.UDTF_ALIAS_MISMATCH.getMsg(
+          "expected " + udtfFieldCount + " aliases " + "but got " + colAliases.size()));

Review Comment:
   nit.: `" aliases but got "`



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);
+    RexCall udtfCall = (RexCall) tableFunctionScanRel.getCall();
+    if (!udtfCall.getOperator().getName().equalsIgnoreCase("inline")) {
+      return false;
+    }
+
+    Preconditions.checkState(!udtfCall.getOperands().isEmpty());
+    RexNode operand = udtfCall.getOperands().get(0);
+    if (!(operand instanceof RexCall)) {
+      return false;
+    }
+    RexCall firstOperand = (RexCall) operand;
+    if (!firstOperand.getOperator().getName().equalsIgnoreCase("array")) {
+      return false;
+    }
+    Preconditions.checkState(!firstOperand.getOperands().isEmpty());
+    int numStructParams = firstOperand.getOperands().get(0).getType().getFieldCount();
+
+    if (tableFunctionScanRel.getRowType().getFieldCount() == numStructParams) {
+      return false;
+    }
+
+    return true;
+  }
+
+  public void onMatch(RelOptRuleCall call) {
+    final HiveTableFunctionScan tfs = call.rel(0);
+    RelNode inputRel = tfs.getInput(0);
+    RexCall inlineCall = (RexCall) tfs.getCall();
+    RexCall arrayCall = (RexCall) inlineCall.getOperands().get(0);
+    RelOptCluster cluster = tfs.getCluster();
+
+    List<RexNode> inputRefs = Lists.transform(inputRel.getRowType().getFieldList(),
+        input -> new RexInputRef(input.getIndex(), input.getType()));
+    List<RexNode> newStructExprs = new ArrayList<>();
+
+    for (RexNode currentStructOperand : arrayCall.getOperands()) {
+      List<RexNode> allOperands = new ArrayList<>(inputRefs);
+      RexCall structCall = (RexCall) currentStructOperand;
+      allOperands.addAll(structCall.getOperands());
+      newStructExprs.add(cluster.getRexBuilder().makeCall(structCall.op, allOperands));
+    }
+
+    List<RelDataType> returnTypes = new ArrayList<>(
+        Lists.transform(inputRel.getRowType().getFieldList(), RelDataTypeField::getType));
+    RexCall firstStructCall = (RexCall) arrayCall.getOperands().get(0);
+    returnTypes.addAll(Lists.transform(firstStructCall.getOperands(), RexNode::getType));
+
+    List<RexNode> newArrayCall =
+        Lists.newArrayList(cluster.getRexBuilder().makeCall(arrayCall.op, newStructExprs));

Review Comment:
   nit. `Collections.singletonList` can be used here if the result array is immutable.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);
+    RexCall udtfCall = (RexCall) tableFunctionScanRel.getCall();
+    if (!udtfCall.getOperator().getName().equalsIgnoreCase("inline")) {
+      return false;
+    }
+
+    Preconditions.checkState(!udtfCall.getOperands().isEmpty());
+    RexNode operand = udtfCall.getOperands().get(0);
+    if (!(operand instanceof RexCall)) {
+      return false;
+    }
+    RexCall firstOperand = (RexCall) operand;
+    if (!firstOperand.getOperator().getName().equalsIgnoreCase("array")) {

Review Comment:
   Please swap the operands



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java:
##########
@@ -0,0 +1,265 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse.relnodegen;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.optimizer.calcite.TraitsUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+import org.apache.hadoop.hive.ql.optimizer.calcite.translator.TypeConverter;
+import org.apache.hadoop.hive.ql.parse.ASTErrorUtils;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.CalcitePlanner;
+import org.apache.hadoop.hive.ql.parse.HiveParser;
+import org.apache.hadoop.hive.ql.parse.RowResolver;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.UnparseTranslator;
+import org.apache.hadoop.hive.ql.parse.type.FunctionHelper;
+import org.apache.hadoop.hive.ql.parse.type.TypeCheckCtx;
+import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableSet;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LateralViewPlan is a helper class holding the objects needed for generating a Calcite
+ * plan from an ASTNode. The object is to be generated when a LATERAL_VIEW token is detected.
+ * From the ASTNode and relevant input node information, the following objects are created:
+ * * A HiveTableFunctionScan RelNode
+ * * A RowResolver containing the row resolving information output from the RelNode
+ * * The table alias for the table generated by the UDTF and Lateral View
+ */
+public class LateralViewPlan {
+  protected static final Logger LOG = LoggerFactory.getLogger(LateralViewPlan.class.getName());
+
+  // Only acceptable token types under the TOK_LATERAL_VIEW token.
+  public static final ImmutableSet<Integer> TABLE_ALIAS_TOKEN_TYPES =
+      ImmutableSet.of(HiveParser.TOK_SUBQUERY, HiveParser.TOK_TABREF, HiveParser.TOK_PTBLFUNCTION);
+
+  // The RelNode created for this lateral view
+  public final RelNode lateralViewRel;
+
+  // The output RowResolver created for this lateral view.
+  public final RowResolver outputRR;
+
+  // The alias provided for the lateral table in the query
+  public final String lateralTableAlias;
+
+  private final RelOptCluster cluster;
+  private final UnparseTranslator unparseTranslator;
+  private final HiveConf conf;
+  private final FunctionHelper functionHelper;
+
+  public LateralViewPlan(ASTNode lateralView, RelOptCluster cluster, RelNode inputRel,
+      RowResolver inputRR, UnparseTranslator unparseTranslator,
+      HiveConf conf, FunctionHelper functionHelper
+      ) throws SemanticException {
+    // initialize global variables containing helper information
+    this.cluster = cluster;
+    this.unparseTranslator = unparseTranslator;
+    this.conf = conf;
+    this.functionHelper = functionHelper;
+
+    // AST should have form of LATERAL_VIEW -> SELECT -> SELEXPR -> FUNCTION -> function info tree
+    ASTNode selExprAST = (ASTNode) lateralView.getChild(0).getChild(0);
+    ASTNode functionAST = (ASTNode) selExprAST.getChild(0);
+
+    this.lateralTableAlias = getTableAliasFromASTNode(selExprAST);
+
+    // The RexCall for the udtf function (e.g. inline)
+    RexCall udtfCall = getUDTFFunction(functionAST, inputRR);
+
+    // Column aliases provided by the query.
+    List<String> columnAliases = getColumnAliasesFromASTNode(selExprAST, udtfCall);
+
+    this.outputRR = getOutputRR(inputRR, udtfCall, columnAliases, this.lateralTableAlias);
+
+    RelDataType retType = getRetType(cluster, inputRel, udtfCall, columnAliases);
+
+    this.lateralViewRel = HiveTableFunctionScan.create(cluster,
+        TraitsUtil.getDefaultTraitSet(cluster), ImmutableList.of(inputRel), udtfCall,
+        null, retType, null);
+  }
+
+  public static void validateLateralView(ASTNode lateralView) throws SemanticException {
+    if (lateralView.getChildCount() != 2) {
+      throw new SemanticException("Token Lateral View contains " + lateralView.getChildCount() +
+          " children.");
+    }
+    ASTNode next = (ASTNode) lateralView.getChild(1);
+    if (!TABLE_ALIAS_TOKEN_TYPES.contains(next.getToken().getType()) &&
+          HiveParser.TOK_LATERAL_VIEW != next.getToken().getType()) {
+        throw new SemanticException(ASTErrorUtils.getMsg(
+            ErrorMsg.LATERAL_VIEW_INVALID_CHILD.getMsg(), lateralView));
+    }
+  }
+
+  private RexCall getUDTFFunction(ASTNode functionAST, RowResolver inputRR)
+      throws SemanticException {
+
+    String functionName = functionAST.getChild(0).getText().toLowerCase();
+
+    // create the RexNode operands for the UDTF RexCall
+    List<RexNode> operandsForUDTF = getOperandsForUDTF(functionAST, inputRR);
+
+    return this.functionHelper.getUDTFFunction(functionName, operandsForUDTF);
+  }
+
+  private String getTableAliasFromASTNode(ASTNode selExprClause) throws SemanticException {
+    // loop through the AST and find the TOK_TABALIAS object
+    for (Node obj : selExprClause.getChildren()) {
+      ASTNode child = (ASTNode) obj;
+      if (child.getToken().getType() == HiveParser.TOK_TABALIAS) {
+        return BaseSemanticAnalyzer.unescapeIdentifier(child.getChild(0).getText().toLowerCase());
+      }
+    }
+
+    // Parser enforces that table alias is added, but check again
+    throw new SemanticException("Alias should be specified LVJ");
+  }
+
+  private List<String> getColumnAliasesFromASTNode(ASTNode selExprClause,
+      RexCall udtfCall) throws SemanticException {
+    Set<String> uniqueNames = new HashSet<>();
+    List<String> colAliases = new ArrayList<>();
+    for (Node obj : selExprClause.getChildren()) {
+      ASTNode child = (ASTNode) obj;
+      // Skip the token values.  The rest should be the identifier column aliases
+      if (child.getToken().getType() == HiveParser.TOK_TABALIAS ||
+          child.getToken().getType() == HiveParser.TOK_FUNCTION) {
+        continue;
+      }
+      String colAlias = BaseSemanticAnalyzer.unescapeIdentifier(child.getText().toLowerCase());
+      if (uniqueNames.contains(colAlias)) {
+        // Column aliases defined by query for lateral view output are duplicated
+        throw new SemanticException(ErrorMsg.COLUMN_ALIAS_ALREADY_EXISTS.getMsg(colAlias));
+      }
+      uniqueNames.add(colAlias);
+      colAliases.add(colAlias);
+    }
+
+    // if no column aliases were provided, just retrieve them from the return type
+    // of the udtf RexCall
+    if (colAliases.isEmpty()) {
+      colAliases.addAll(
+          Lists.transform(udtfCall.getType().getFieldList(), RelDataTypeField::getName));
+    }
+
+    // Verify that there is an alias for all the columns returned by the udtf call.
+    int udtfFieldCount = udtfCall.getType().getFieldCount();
+    if (colAliases.size() != udtfFieldCount) {
+      // Number of columns in the aliases does not match with number of columns
+      // generated by the lateral view
+      throw new SemanticException(ErrorMsg.UDTF_ALIAS_MISMATCH.getMsg(
+          "expected " + udtfFieldCount + " aliases " + "but got " + colAliases.size()));
+    }
+
+    return colAliases;
+  }
+
+  private List<RexNode> getOperandsForUDTF(ASTNode functionCall,
+      RowResolver inputRR) throws SemanticException {
+    List<RexNode> operands = new ArrayList<>();
+    TypeCheckCtx tcCtx = new TypeCheckCtx(inputRR, this.cluster.getRexBuilder(), false, false);
+    tcCtx.setUnparseTranslator(this.unparseTranslator);
+    // Start at 1 because value 0 is the function name.  Use the CalcitePlanner.genRexNode
+    // to retrieve the RexNode for all the function parameters.
+    for (int i = 1; i < functionCall.getChildren().size(); ++i) {
+      ASTNode functionParam = (ASTNode) functionCall.getChild(i);
+      operands.add(CalcitePlanner.genRexNode(functionParam, inputRR, tcCtx, this.conf));
+    }
+    return operands;
+  }
+
+  private RowResolver getOutputRR(RowResolver inputRR, RexCall udtfCall,
+      List<String> columnAliases, String lateralTableAlias) throws SemanticException {
+
+    RowResolver localOutputRR = new RowResolver();
+
+    // After calling RowResolver, outputRR will be mutated to contain the row resolver
+    // fields
+    if (!RowResolver.add(localOutputRR, inputRR)) {
+      LOG.warn("Duplicates detected when adding columns to RR: see previous message");
+    }
+
+    // The RexNode return value for a udtf is always a struct.
+    TypeInfo typeInfo = TypeConverter.convert(udtfCall.getType());
+    Preconditions.checkState(typeInfo instanceof StructTypeInfo);
+
+    StructTypeInfo typeInfos = (StructTypeInfo) typeInfo;
+    // Match up the column alias with the return value of the udtf and
+    // place in the outputRR
+    for (int i = 0, j = 0; i < columnAliases.size(); i++) {

Review Comment:
   Should `j` be initialized before each do-while loop?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "kasakrisz (via GitHub)" <gi...@apache.org>.
kasakrisz commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1226136211


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);

Review Comment:
   I was thinking of something like this instead of precondition:
   ```
   if (!(tableFunctionScanRel.getCall() instanceof RexCall)) {
     return false;
   }
   ```
   So we just silently don't apply this rule. Not sure whether we would end up with an invalid plan. If so then we can keep precondition otherwise it is too agressive.
   
   Btw I saw assertions elsewhere in the code but those are not run in prod so the outcome still depends on the plan.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4378:
URL: https://github.com/apache/hive/pull/4378#issuecomment-1581822471

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4378)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [14 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] rkirtir commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "rkirtir (via GitHub)" <gi...@apache.org>.
rkirtir commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1212824198


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java:
##########


Review Comment:
   Can we have a nit test covering these lines



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########


Review Comment:
   How do we test these rules?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "scarlin-cloudera (via GitHub)" <gi...@apache.org>.
scarlin-cloudera commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1222066641


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/FunctionHelper.java:
##########
@@ -78,6 +79,12 @@ AggregateInfo getWindowAggregateFunctionInfo(boolean isDistinct, boolean isAllCo
       String aggregateName, List<RexNode> aggregateParameters)
       throws SemanticException;
 
+  /**
+   * Returns RexCall for UDTF based on given parameters
+   */
+  RexCall getUDTFFunction(String functionName, List<RexNode> operands)

Review Comment:
   I hope I made the change you wanted.  It already mentioned returning a RexCall, but didn't mention anything about getting it from the FunctionRegistry. Although I want to be careful not explicitly mentioned FunctionRegistry because this is an interface and can be theoretically overridden by a third party implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "scarlin-cloudera (via GitHub)" <gi...@apache.org>.
scarlin-cloudera commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1222062960


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java:
##########
@@ -0,0 +1,265 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse.relnodegen;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.optimizer.calcite.TraitsUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+import org.apache.hadoop.hive.ql.optimizer.calcite.translator.TypeConverter;
+import org.apache.hadoop.hive.ql.parse.ASTErrorUtils;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.CalcitePlanner;
+import org.apache.hadoop.hive.ql.parse.HiveParser;
+import org.apache.hadoop.hive.ql.parse.RowResolver;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.UnparseTranslator;
+import org.apache.hadoop.hive.ql.parse.type.FunctionHelper;
+import org.apache.hadoop.hive.ql.parse.type.TypeCheckCtx;
+import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableSet;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LateralViewPlan is a helper class holding the objects needed for generating a Calcite
+ * plan from an ASTNode. The object is to be generated when a LATERAL_VIEW token is detected.
+ * From the ASTNode and relevant input node information, the following objects are created:
+ * * A HiveTableFunctionScan RelNode
+ * * A RowResolver containing the row resolving information output from the RelNode
+ * * The table alias for the table generated by the UDTF and Lateral View
+ */
+public class LateralViewPlan {
+  protected static final Logger LOG = LoggerFactory.getLogger(LateralViewPlan.class.getName());
+
+  // Only acceptable token types under the TOK_LATERAL_VIEW token.
+  public static final ImmutableSet<Integer> TABLE_ALIAS_TOKEN_TYPES =
+      ImmutableSet.of(HiveParser.TOK_SUBQUERY, HiveParser.TOK_TABREF, HiveParser.TOK_PTBLFUNCTION);
+
+  // The RelNode created for this lateral view
+  public final RelNode lateralViewRel;
+
+  // The output RowResolver created for this lateral view.
+  public final RowResolver outputRR;
+
+  // The alias provided for the lateral table in the query
+  public final String lateralTableAlias;
+
+  private final RelOptCluster cluster;
+  private final UnparseTranslator unparseTranslator;
+  private final HiveConf conf;
+  private final FunctionHelper functionHelper;
+
+  public LateralViewPlan(ASTNode lateralView, RelOptCluster cluster, RelNode inputRel,
+      RowResolver inputRR, UnparseTranslator unparseTranslator,
+      HiveConf conf, FunctionHelper functionHelper
+      ) throws SemanticException {
+    // initialize global variables containing helper information
+    this.cluster = cluster;
+    this.unparseTranslator = unparseTranslator;
+    this.conf = conf;
+    this.functionHelper = functionHelper;
+
+    // AST should have form of LATERAL_VIEW -> SELECT -> SELEXPR -> FUNCTION -> function info tree
+    ASTNode selExprAST = (ASTNode) lateralView.getChild(0).getChild(0);
+    ASTNode functionAST = (ASTNode) selExprAST.getChild(0);
+
+    this.lateralTableAlias = getTableAliasFromASTNode(selExprAST);
+
+    // The RexCall for the udtf function (e.g. inline)
+    RexCall udtfCall = getUDTFFunction(functionAST, inputRR);
+
+    // Column aliases provided by the query.
+    List<String> columnAliases = getColumnAliasesFromASTNode(selExprAST, udtfCall);
+
+    this.outputRR = getOutputRR(inputRR, udtfCall, columnAliases, this.lateralTableAlias);
+
+    RelDataType retType = getRetType(cluster, inputRel, udtfCall, columnAliases);
+
+    this.lateralViewRel = HiveTableFunctionScan.create(cluster,
+        TraitsUtil.getDefaultTraitSet(cluster), ImmutableList.of(inputRel), udtfCall,
+        null, retType, null);
+  }
+
+  public static void validateLateralView(ASTNode lateralView) throws SemanticException {
+    if (lateralView.getChildCount() != 2) {
+      throw new SemanticException("Token Lateral View contains " + lateralView.getChildCount() +
+          " children.");
+    }
+    ASTNode next = (ASTNode) lateralView.getChild(1);
+    if (!TABLE_ALIAS_TOKEN_TYPES.contains(next.getToken().getType()) &&
+          HiveParser.TOK_LATERAL_VIEW != next.getToken().getType()) {
+        throw new SemanticException(ASTErrorUtils.getMsg(
+            ErrorMsg.LATERAL_VIEW_INVALID_CHILD.getMsg(), lateralView));
+    }
+  }
+
+  private RexCall getUDTFFunction(ASTNode functionAST, RowResolver inputRR)
+      throws SemanticException {
+
+    String functionName = functionAST.getChild(0).getText().toLowerCase();
+
+    // create the RexNode operands for the UDTF RexCall
+    List<RexNode> operandsForUDTF = getOperandsForUDTF(functionAST, inputRR);
+
+    return this.functionHelper.getUDTFFunction(functionName, operandsForUDTF);
+  }
+
+  private String getTableAliasFromASTNode(ASTNode selExprClause) throws SemanticException {
+    // loop through the AST and find the TOK_TABALIAS object
+    for (Node obj : selExprClause.getChildren()) {
+      ASTNode child = (ASTNode) obj;
+      if (child.getToken().getType() == HiveParser.TOK_TABALIAS) {
+        return BaseSemanticAnalyzer.unescapeIdentifier(child.getChild(0).getText().toLowerCase());
+      }
+    }
+
+    // Parser enforces that table alias is added, but check again
+    throw new SemanticException("Alias should be specified LVJ");
+  }
+
+  private List<String> getColumnAliasesFromASTNode(ASTNode selExprClause,
+      RexCall udtfCall) throws SemanticException {
+    Set<String> uniqueNames = new HashSet<>();
+    List<String> colAliases = new ArrayList<>();
+    for (Node obj : selExprClause.getChildren()) {
+      ASTNode child = (ASTNode) obj;
+      // Skip the token values.  The rest should be the identifier column aliases
+      if (child.getToken().getType() == HiveParser.TOK_TABALIAS ||
+          child.getToken().getType() == HiveParser.TOK_FUNCTION) {
+        continue;
+      }
+      String colAlias = BaseSemanticAnalyzer.unescapeIdentifier(child.getText().toLowerCase());
+      if (uniqueNames.contains(colAlias)) {
+        // Column aliases defined by query for lateral view output are duplicated
+        throw new SemanticException(ErrorMsg.COLUMN_ALIAS_ALREADY_EXISTS.getMsg(colAlias));
+      }
+      uniqueNames.add(colAlias);
+      colAliases.add(colAlias);
+    }
+
+    // if no column aliases were provided, just retrieve them from the return type
+    // of the udtf RexCall
+    if (colAliases.isEmpty()) {
+      colAliases.addAll(
+          Lists.transform(udtfCall.getType().getFieldList(), RelDataTypeField::getName));
+    }
+
+    // Verify that there is an alias for all the columns returned by the udtf call.
+    int udtfFieldCount = udtfCall.getType().getFieldCount();
+    if (colAliases.size() != udtfFieldCount) {
+      // Number of columns in the aliases does not match with number of columns
+      // generated by the lateral view
+      throw new SemanticException(ErrorMsg.UDTF_ALIAS_MISMATCH.getMsg(
+          "expected " + udtfFieldCount + " aliases " + "but got " + colAliases.size()));

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "scarlin-cloudera (via GitHub)" <gi...@apache.org>.
scarlin-cloudera commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1222051263


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);
+    RexCall udtfCall = (RexCall) tableFunctionScanRel.getCall();
+    if (!udtfCall.getOperator().getName().equalsIgnoreCase("inline")) {

Review Comment:
   Yeah, this makes sense, and I usually code things this way, not sure why I did it this way here. 
   
   But it did also make me think:  Should I use the Hive SqlFunctionConverter.reverseOperatorMap.get(SqlStdOperatorTable.ARRAY_VALUE_CONSTRUCTOR) to get the constant?  And if I did that, should I do a null check there to make sure it's valid?  Or should I just keep it the way it is.  
   
   I can't decide which is better. I like that way because it should theoretically be the same constant string. But it does seem a little awkward to code it that way, I would prefer a general constant for "array" within that class but that seems like too much of a rewrite....or maybe I should just do it for this one function?
   
   Thoughts?
   
   



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveOptimizeInlineArrayTableFunctionRule.java:
##########
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This rule optimizes the inline udtf in a HiveTableFunctionScan when it
+ * has an array of structures. The RelNode for a HiveTableFunctionScan places
+ * the input references as the first elements in the return type followed by
+ * the udtf return value which represents the items in the generated table. Take the
+ * case where the base (input) table has col1, and the inline function is represented by:
+ * inline(array( struct1(col2, col3), struct2(col2, col3), struct3(col2, col3), etc...)),
+ * ...and the return value for the table scan node is (col1, col2, col3). In this case,
+ * the same col1 value is joined with the structures within the inline array for the
+ * col2 and col3 values.
+ *
+ * The optimization is to put the "col1" value within the inline array, resulting in
+ * in the new structure:
+ * inline(array(struct1(col1, col2, col3), struct2(col1, col2, col3), ...)
+ * By doing this, we avoid creating a lateral view join operator and a lateral view forward
+ * operator at runtime.
+ */
+public class HiveOptimizeInlineArrayTableFunctionRule extends RelOptRule {
+
+  public static final HiveOptimizeInlineArrayTableFunctionRule INSTANCE =
+          new HiveOptimizeInlineArrayTableFunctionRule(HiveRelFactories.HIVE_BUILDER);
+
+  public HiveOptimizeInlineArrayTableFunctionRule(RelBuilderFactory relBuilderFactory) {
+    super(operand(HiveTableFunctionScan.class, any()), relBuilderFactory, null);
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+    final HiveTableFunctionScan tableFunctionScanRel = call.rel(0);
+
+    Preconditions.checkState(tableFunctionScanRel.getCall() instanceof RexCall);
+    RexCall udtfCall = (RexCall) tableFunctionScanRel.getCall();
+    if (!udtfCall.getOperator().getName().equalsIgnoreCase("inline")) {
+      return false;
+    }
+
+    Preconditions.checkState(!udtfCall.getOperands().isEmpty());
+    RexNode operand = udtfCall.getOperands().get(0);
+    if (!(operand instanceof RexCall)) {
+      return false;
+    }
+    RexCall firstOperand = (RexCall) operand;
+    if (!firstOperand.getOperator().getName().equalsIgnoreCase("array")) {

Review Comment:
   Same as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4378:
URL: https://github.com/apache/hive/pull/4378#issuecomment-1572836606

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4378)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [13 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4378:
URL: https://github.com/apache/hive/pull/4378#issuecomment-1571636710

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4378)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4378&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL) [13 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4378&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4378&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "scarlin-cloudera (via GitHub)" <gi...@apache.org>.
scarlin-cloudera commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1222070356


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/relnodegen/LateralViewPlan.java:
##########
@@ -0,0 +1,265 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse.relnodegen;
+
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ColumnInfo;
+import org.apache.hadoop.hive.ql.lib.Node;
+import org.apache.hadoop.hive.ql.optimizer.calcite.TraitsUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableFunctionScan;
+import org.apache.hadoop.hive.ql.optimizer.calcite.translator.TypeConverter;
+import org.apache.hadoop.hive.ql.parse.ASTErrorUtils;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.CalcitePlanner;
+import org.apache.hadoop.hive.ql.parse.HiveParser;
+import org.apache.hadoop.hive.ql.parse.RowResolver;
+import org.apache.hadoop.hive.ql.parse.SemanticAnalyzer;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.UnparseTranslator;
+import org.apache.hadoop.hive.ql.parse.type.FunctionHelper;
+import org.apache.hadoop.hive.ql.parse.type.TypeCheckCtx;
+import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableSet;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * LateralViewPlan is a helper class holding the objects needed for generating a Calcite
+ * plan from an ASTNode. The object is to be generated when a LATERAL_VIEW token is detected.
+ * From the ASTNode and relevant input node information, the following objects are created:
+ * * A HiveTableFunctionScan RelNode
+ * * A RowResolver containing the row resolving information output from the RelNode
+ * * The table alias for the table generated by the UDTF and Lateral View
+ */
+public class LateralViewPlan {
+  protected static final Logger LOG = LoggerFactory.getLogger(LateralViewPlan.class.getName());
+
+  // Only acceptable token types under the TOK_LATERAL_VIEW token.
+  public static final ImmutableSet<Integer> TABLE_ALIAS_TOKEN_TYPES =
+      ImmutableSet.of(HiveParser.TOK_SUBQUERY, HiveParser.TOK_TABREF, HiveParser.TOK_PTBLFUNCTION);
+
+  // The RelNode created for this lateral view
+  public final RelNode lateralViewRel;
+
+  // The output RowResolver created for this lateral view.
+  public final RowResolver outputRR;
+
+  // The alias provided for the lateral table in the query
+  public final String lateralTableAlias;
+
+  private final RelOptCluster cluster;
+  private final UnparseTranslator unparseTranslator;
+  private final HiveConf conf;
+  private final FunctionHelper functionHelper;
+
+  public LateralViewPlan(ASTNode lateralView, RelOptCluster cluster, RelNode inputRel,
+      RowResolver inputRR, UnparseTranslator unparseTranslator,
+      HiveConf conf, FunctionHelper functionHelper
+      ) throws SemanticException {
+    // initialize global variables containing helper information
+    this.cluster = cluster;
+    this.unparseTranslator = unparseTranslator;
+    this.conf = conf;
+    this.functionHelper = functionHelper;
+
+    // AST should have form of LATERAL_VIEW -> SELECT -> SELEXPR -> FUNCTION -> function info tree
+    ASTNode selExprAST = (ASTNode) lateralView.getChild(0).getChild(0);
+    ASTNode functionAST = (ASTNode) selExprAST.getChild(0);
+
+    this.lateralTableAlias = getTableAliasFromASTNode(selExprAST);
+
+    // The RexCall for the udtf function (e.g. inline)
+    RexCall udtfCall = getUDTFFunction(functionAST, inputRR);
+
+    // Column aliases provided by the query.
+    List<String> columnAliases = getColumnAliasesFromASTNode(selExprAST, udtfCall);
+
+    this.outputRR = getOutputRR(inputRR, udtfCall, columnAliases, this.lateralTableAlias);
+
+    RelDataType retType = getRetType(cluster, inputRel, udtfCall, columnAliases);
+
+    this.lateralViewRel = HiveTableFunctionScan.create(cluster,
+        TraitsUtil.getDefaultTraitSet(cluster), ImmutableList.of(inputRel), udtfCall,
+        null, retType, null);
+  }
+
+  public static void validateLateralView(ASTNode lateralView) throws SemanticException {
+    if (lateralView.getChildCount() != 2) {
+      throw new SemanticException("Token Lateral View contains " + lateralView.getChildCount() +
+          " children.");
+    }
+    ASTNode next = (ASTNode) lateralView.getChild(1);
+    if (!TABLE_ALIAS_TOKEN_TYPES.contains(next.getToken().getType()) &&
+          HiveParser.TOK_LATERAL_VIEW != next.getToken().getType()) {
+        throw new SemanticException(ASTErrorUtils.getMsg(
+            ErrorMsg.LATERAL_VIEW_INVALID_CHILD.getMsg(), lateralView));
+    }
+  }
+
+  private RexCall getUDTFFunction(ASTNode functionAST, RowResolver inputRR)
+      throws SemanticException {
+
+    String functionName = functionAST.getChild(0).getText().toLowerCase();
+
+    // create the RexNode operands for the UDTF RexCall
+    List<RexNode> operandsForUDTF = getOperandsForUDTF(functionAST, inputRR);
+
+    return this.functionHelper.getUDTFFunction(functionName, operandsForUDTF);
+  }
+
+  private String getTableAliasFromASTNode(ASTNode selExprClause) throws SemanticException {
+    // loop through the AST and find the TOK_TABALIAS object
+    for (Node obj : selExprClause.getChildren()) {
+      ASTNode child = (ASTNode) obj;
+      if (child.getToken().getType() == HiveParser.TOK_TABALIAS) {
+        return BaseSemanticAnalyzer.unescapeIdentifier(child.getChild(0).getText().toLowerCase());
+      }
+    }
+
+    // Parser enforces that table alias is added, but check again
+    throw new SemanticException("Alias should be specified LVJ");
+  }
+
+  private List<String> getColumnAliasesFromASTNode(ASTNode selExprClause,
+      RexCall udtfCall) throws SemanticException {
+    Set<String> uniqueNames = new HashSet<>();
+    List<String> colAliases = new ArrayList<>();
+    for (Node obj : selExprClause.getChildren()) {
+      ASTNode child = (ASTNode) obj;
+      // Skip the token values.  The rest should be the identifier column aliases
+      if (child.getToken().getType() == HiveParser.TOK_TABALIAS ||
+          child.getToken().getType() == HiveParser.TOK_FUNCTION) {
+        continue;
+      }
+      String colAlias = BaseSemanticAnalyzer.unescapeIdentifier(child.getText().toLowerCase());
+      if (uniqueNames.contains(colAlias)) {
+        // Column aliases defined by query for lateral view output are duplicated
+        throw new SemanticException(ErrorMsg.COLUMN_ALIAS_ALREADY_EXISTS.getMsg(colAlias));
+      }
+      uniqueNames.add(colAlias);
+      colAliases.add(colAlias);
+    }
+
+    // if no column aliases were provided, just retrieve them from the return type
+    // of the udtf RexCall
+    if (colAliases.isEmpty()) {
+      colAliases.addAll(
+          Lists.transform(udtfCall.getType().getFieldList(), RelDataTypeField::getName));
+    }
+
+    // Verify that there is an alias for all the columns returned by the udtf call.
+    int udtfFieldCount = udtfCall.getType().getFieldCount();
+    if (colAliases.size() != udtfFieldCount) {
+      // Number of columns in the aliases does not match with number of columns
+      // generated by the lateral view
+      throw new SemanticException(ErrorMsg.UDTF_ALIAS_MISMATCH.getMsg(
+          "expected " + udtfFieldCount + " aliases " + "but got " + colAliases.size()));
+    }
+
+    return colAliases;
+  }
+
+  private List<RexNode> getOperandsForUDTF(ASTNode functionCall,
+      RowResolver inputRR) throws SemanticException {
+    List<RexNode> operands = new ArrayList<>();
+    TypeCheckCtx tcCtx = new TypeCheckCtx(inputRR, this.cluster.getRexBuilder(), false, false);
+    tcCtx.setUnparseTranslator(this.unparseTranslator);
+    // Start at 1 because value 0 is the function name.  Use the CalcitePlanner.genRexNode
+    // to retrieve the RexNode for all the function parameters.
+    for (int i = 1; i < functionCall.getChildren().size(); ++i) {
+      ASTNode functionParam = (ASTNode) functionCall.getChild(i);
+      operands.add(CalcitePlanner.genRexNode(functionParam, inputRR, tcCtx, this.conf));
+    }
+    return operands;
+  }
+
+  private RowResolver getOutputRR(RowResolver inputRR, RexCall udtfCall,
+      List<String> columnAliases, String lateralTableAlias) throws SemanticException {
+
+    RowResolver localOutputRR = new RowResolver();
+
+    // After calling RowResolver, outputRR will be mutated to contain the row resolver
+    // fields
+    if (!RowResolver.add(localOutputRR, inputRR)) {
+      LOG.warn("Duplicates detected when adding columns to RR: see previous message");
+    }
+
+    // The RexNode return value for a udtf is always a struct.
+    TypeInfo typeInfo = TypeConverter.convert(udtfCall.getType());
+    Preconditions.checkState(typeInfo instanceof StructTypeInfo);
+
+    StructTypeInfo typeInfos = (StructTypeInfo) typeInfo;
+    // Match up the column alias with the return value of the udtf and
+    // place in the outputRR
+    for (int i = 0, j = 0; i < columnAliases.size(); i++) {

Review Comment:
   That's a good question.  I was a little lazy here because I just grabbed the whole for loop from the original code and didn't want to rewrite this in case I missed something. But this piece of code does seem a bit awkward and maybe could use a little rewriting.
   
   I think it's ok as it is though?  Essentially once it grabs the internal column name, that one will be "used".  If we reset "j" before the do while loop instead of the "for" loop, it will go back to zero but will proceed until it hits the "j+1" value anyway, I think, since all those internal columns will be in the outputRR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #4378: HIVE-27391: Refactor Calcite node generation from lateral views

Posted by "scarlin-cloudera (via GitHub)" <gi...@apache.org>.
scarlin-cloudera commented on code in PR #4378:
URL: https://github.com/apache/hive/pull/4378#discussion_r1222062080


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java:
##########
@@ -448,6 +451,23 @@ public AggregateInfo getWindowAggregateFunctionInfo(boolean isDistinct, boolean
         new AggregateInfo(aggregateParameters, returnType, aggregateName, isDistinct) : null;
   }
 
+  public RexCall getUDTFFunction(String functionName, List<RexNode> operands)
+      throws SemanticException {
+    // Extract the argument types for the operands into a list
+    List<RelDataType> operandTypes = Lists.transform(operands, RexNode::getType);
+
+    FunctionInfo functionInfo = FunctionRegistry.getFunctionInfo(functionName);
+    GenericUDTF genericUDTF = functionInfo.getGenericUDTF();
+    Preconditions.checkNotNull(genericUDTF);

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org