You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/01/13 15:23:27 UTC

[GitHub] [hive] kasakrisz opened a new pull request #1865: HIVE-24633: Support CTE with column labels

kasakrisz opened a new pull request #1865:
URL: https://github.com/apache/hive/pull/1865


   ### What changes were proposed in this pull request?
   1. Improve the parser to accept CTE clause with `with column list` specified:
   ```
   WITH cte(a, b) AS ...
   ```
   2. When transforming subquery AST tree to Calcite RelNode tree a new RowResolver is created for the subquery's top node to point its alias. Extend this logic with assign the `with column list` elements to each entry if explicitly specified in the `WITH` clause.
   
   ### Why are the changes needed?
   SQL standard enables this feature.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. When users specify `with column list` the list elements must be used to reference expressions in the CTE' select clause from the main query.
   
   ### How was this patch tested?
   ```
   mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=cte_8.q -pl itests/qtest -Pitests
   mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=cte_mat_1.q -pl itests/qtest -Pitests
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1865: HIVE-24633: Support CTE with column labels

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1865:
URL: https://github.com/apache/hive/pull/1865#discussion_r559202460



##########
File path: ql/src/test/queries/clientpositive/cte_8.q
##########
@@ -0,0 +1,34 @@
+set hive.cli.print.header=true;
+
+create table t1(int_col int, bigint_col bigint);
+
+insert into t1 values(1, 2), (3, 4);
+
+explain cbo
+with cte1(a, b) as (select int_col x, bigint_col y from t1)
+select a, b from cte1;
+
+with cte1(a, b) as (select int_col x, bigint_col y from t1)
+select a, b from cte1;
+
+with cte1(a) as (select int_col x, bigint_col y from t1)

Review comment:
       What happens for the following query?
   ```
   with cte1(a) as (select int_col x, bigint_col a from t1)
   ...
   ```
   Basically the `a` alias will clash. What is our behavior compared to other RDBMS that allows this such as PG? Can we add the test?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor merged pull request #1865: HIVE-24633: Support CTE with column labels

Posted by GitBox <gi...@apache.org>.
jcamachor merged pull request #1865:
URL: https://github.com/apache/hive/pull/1865


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #1865: HIVE-24633: Support CTE with column labels

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #1865:
URL: https://github.com/apache/hive/pull/1865#discussion_r559440784



##########
File path: ql/src/test/queries/clientpositive/cte_8.q
##########
@@ -0,0 +1,34 @@
+set hive.cli.print.header=true;
+
+create table t1(int_col int, bigint_col bigint);
+
+insert into t1 values(1, 2), (3, 4);
+
+explain cbo
+with cte1(a, b) as (select int_col x, bigint_col y from t1)
+select a, b from cte1;
+
+with cte1(a, b) as (select int_col x, bigint_col y from t1)
+select a, b from cte1;
+
+with cte1(a) as (select int_col x, bigint_col y from t1)

Review comment:
       Our behavior is the same as Postgres: ambiguous column reference is only allowed when the main query has `select *`
   If the main query has an explicit reference to the ambiguous column in any clause the query won't compile.
   Added test case for both scenario.
   
   The difference between Hive and Postgres is the final column names int the result set:
   ```
   with cte1(a) as (select int_col x, bigint_col a from t1)
   select * from cte1;
   ``` 
   Postgres: `a, a`
   Hive: `cte1.a	cte1._col1`
   
   After some research I found that we alter the column name to its internal name because CBO cannot handle ambiguous column names: [HIVE-19770](https://issues.apache.org/jira/browse/HIVE-19770)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #1865: HIVE-24633: Support CTE with column labels

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #1865:
URL: https://github.com/apache/hive/pull/1865#discussion_r559440784



##########
File path: ql/src/test/queries/clientpositive/cte_8.q
##########
@@ -0,0 +1,34 @@
+set hive.cli.print.header=true;
+
+create table t1(int_col int, bigint_col bigint);
+
+insert into t1 values(1, 2), (3, 4);
+
+explain cbo
+with cte1(a, b) as (select int_col x, bigint_col y from t1)
+select a, b from cte1;
+
+with cte1(a, b) as (select int_col x, bigint_col y from t1)
+select a, b from cte1;
+
+with cte1(a) as (select int_col x, bigint_col y from t1)

Review comment:
       Our behavior is the same as Postgres: ambiguous column reference is only allowed when the main query has `select *`
   If the main query has an explicit reference to the ambiguous column the query won't compile.
   Added test case for both scenario.
   
   The difference between Hive and Postgres is the final column names int the result set:
   ```
   with cte1(a) as (select int_col x, bigint_col a from t1)
   select * from cte1;
   ``` 
   Postgres: `a, a`
   Hive: `cte1.a	cte1._col1`
   
   After some research I found that we alter the column name to its internal name because CBO cannot handle ambiguous column names: [HIVE-19770](https://issues.apache.org/jira/browse/HIVE-19770)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor merged pull request #1865: HIVE-24633: Support CTE with column labels

Posted by GitBox <gi...@apache.org>.
jcamachor merged pull request #1865:
URL: https://github.com/apache/hive/pull/1865


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org