You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/23 15:15:21 UTC

[GitHub] [spark] maropu commented on a change in pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries

maropu commented on a change in pull request #32303:
URL: https://github.com/apache/spark/pull/32303#discussion_r619272576



##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -631,6 +631,10 @@ lateralView
     : LATERAL VIEW (OUTER)? qualifiedName '(' (expression (',' expression)*)? ')' tblName=identifier (AS? colName+=identifier (',' colName+=identifier)*)?
     ;
 
+lateralClause
+    : LATERAL right=relationPrimary
+    ;

Review comment:
       It seems the parser of PostgreSQL handles this lateral clause in the table definition; https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y#L12091-L12117
   So, could we write it like this with reference to it?
   ```
   relation
       : relationPrimary joinRelation*
       : LATERAL right=relationPrimary ....
       ;
   
   ```

##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -588,7 +588,7 @@ hintStatement
     ;
 
 fromClause
-    : FROM relation (',' relation)* lateralView* pivotClause?
+    : FROM relation (',' lateralClause)* (',' relation)* lateralView* pivotClause?

Review comment:
       A leftmost lateral clause cannot be accepted? It seems PostgreSQL can;
   ```
   postgres=# SELECT * FROM LATERAL (SELECT c1 from t2) tx, t1;
    c1 | c1 | c2 
   ----+----+----
     0 |  0 |  1
     0 |  1 |  2
     0 |  0 |  1
     0 |  1 |  2
   (4 rows)
   ```

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
##########
@@ -392,16 +392,27 @@ case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevalu
           throw QueryCompilationErrors.starExpandDataTypeNotSupportedError(target.get)
       }
     } else {
-      val from = input.inputSet.map(_.name).mkString(", ")
-      val targetString = target.get.mkString(".")
-      throw QueryCompilationErrors.cannotResolveStarExpandGivenInputColumnsError(
-        targetString, from)
+      // Change the unresolved star to an expanded star to indicate the star has been
+      // expanded once.
+      ExpandedStar(this) :: Nil
     }
   }
 
   override def toString: String = target.map(_ + ".").getOrElse("") + "*"
 }
 
+/**
+ * Represent an [[UnresolvedStar]] that has been to expand once.
+ * This can be used to resolve a star expression using multiple plans.
+ */
+case class ExpandedStar(star: UnresolvedStar) extends Star with Unevaluable {

Review comment:
       Why do we need this?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1638,6 +1660,27 @@ class Analyzer(override val catalogManager: CatalogManager)
     }
   }
 
+  /**
+   * Optionally resolve the name parts using the outer query plan and wrap resolved attributes
+   * with [[OuterReference]]s.
+   */
+  private def resolveOuterReference(

Review comment:
       Just a question; instead of using the outer reference, we cannot simply pull up a filter condition into a join condition?
   For example, in a query below, can an analyzer converts`'Join LateralJoin(Inner)` into `'Join LateralJoin(Inner), 't1.c1 = c1#171`, then resolves `'t1.c1` by `ResolveReferences`?
   ```
    +- 'Join LateralJoin(Inner)
       :- SubqueryAlias spark_catalog.default.t1
       :  +- View (`default`.`t1`, [c1#167,c2#168])
       :     +- Project [cast(col1#169 as int) AS c1#167, cast(col2#170 as int) AS c2#168]
       :        +- LocalRelation [col1#169, col2#170]
       +- 'SubqueryAlias __auto_generated_subquery_name
          +- 'Project ['c2]
             +- 'Filter ('t1.c1 = c1#171)
                +- SubqueryAlias spark_catalog.default.t2
                   +- View (`default`.`t2`, [c1#171,c2#172])
                      +- Project [cast(col1#173 as int) AS c1#171, cast(col2#174 as int) AS c2#172]
                         +- LocalRelation [col1#173, col2#174]
   ```

##########
File path: sql/core/src/test/resources/sql-tests/inputs/join-lateral.sql
##########
@@ -0,0 +1,84 @@
+-- Test cases for lateral join

Review comment:
       Could you activate some tests in `postgreSQL/join.sql`?;
   https://github.com/apache/spark/blob/e503b9c462676c381ba51ad48edd968d6e184fc2/sql/core/src/test/resources/sql-tests/inputs/postgreSQL/join.sql#L1278

##########
File path: sql/core/src/test/resources/sql-tests/inputs/join-lateral.sql
##########
@@ -0,0 +1,84 @@
+-- Test cases for lateral join
+
+CREATE VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
+CREATE VIEW t2(c1, c2) AS VALUES (0, 2), (0, 3);
+
+-- lateral join with single column select
+SELECT * FROM t1, LATERAL (SELECT c1);

Review comment:
       Is this compiled into a join query? We cannot compile it into a simple scan query like PostgreSQL?
   ```
   postgres=# explain SELECT * FROM t1, LATERAL (SELECT c1) t2;
                             QUERY PLAN                          
   --------------------------------------------------------------
    Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=12)
   (1 row)
   ```

##########
File path: sql/core/src/test/resources/sql-tests/inputs/join-lateral.sql
##########
@@ -0,0 +1,84 @@
+-- Test cases for lateral join
+
+CREATE VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
+CREATE VIEW t2(c1, c2) AS VALUES (0, 2), (0, 3);
+
+-- lateral join with single column select
+SELECT * FROM t1, LATERAL (SELECT c1);

Review comment:
       Btw, in PostgreSQL, it seems a table alias must be needed for this literal query;
   ```
   postgres=# SELECT * FROM t1, LATERAL (SELECT c1);
   2021-04-23 23:43:11.763 JST [43125] ERROR:  subquery in FROM must have an alias at character 27
   2021-04-23 23:43:11.763 JST [43125] HINT:  For example, FROM (SELECT ...) [AS] foo.
   2021-04-23 23:43:11.763 JST [43125] STATEMENT:  SELECT * FROM t1, LATERAL (SELECT c1);
   ERROR:  subquery in FROM must have an alias
   LINE 1: SELECT * FROM t1, LATERAL (SELECT c1);
                                     ^
   HINT:  For example, FROM (SELECT ...) [AS] foo.
   ```
   Could you check the ANSI/SQL definition or the other database behaviours?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org