You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/27 08:04:26 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #37000: [SPARK-39615][SQL][WIP] Make listColumns be compatible with 3 layer namespace

zhengruifeng opened a new pull request, #37000:
URL: https://github.com/apache/spark/pull/37000

   ### What changes were proposed in this pull request?
   Make listColumns be compatible with 3 layer namespace
   
   
   ### Why are the changes needed?
   for 3 layer namespace compatiblity
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes
   
   
   ### How was this patch tested?
   added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #37000: [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.

amaliujia commented on code in PR #37000:
URL: https://github.com/apache/spark/pull/37000#discussion_r907955439


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -23,14 +23,15 @@ import scala.util.control.NonFatal
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalog.{Catalog, CatalogMetadata, Column, Database, Function, Table}
 import org.apache.spark.sql.catalyst.{DefinedByConstructorParams, FunctionIdentifier, TableIdentifier}
-import org.apache.spark.sql.catalyst.analysis.{ResolvedNamespace, ResolvedTable, ResolvedView, UnresolvedDBObjectName, UnresolvedNamespace, UnresolvedTable, UnresolvedTableOrView}
+import org.apache.spark.sql.catalyst.analysis._
 import org.apache.spark.sql.catalyst.catalog._
 import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
-import org.apache.spark.sql.catalyst.plans.logical.{CreateTable, LocalRelation, RecoverPartitions, ShowTables, SubqueryAlias, TableSpec, View}
+import org.apache.spark.sql.catalyst.plans.logical._

Review Comment:
   revert 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #37000: [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.

amaliujia commented on code in PR #37000:
URL: https://github.com/apache/spark/pull/37000#discussion_r907945624


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -208,8 +209,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    */
   @throws[AnalysisException]("table does not exist")
   override def listColumns(tableName: String): Dataset[Column] = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    listColumns(tableIdent)
+    // calling `sqlParser.parseTableIdentifier` to parse tableName. If it contains only table name
+    // and optionally contains a database name(thus a TableIdentifier), then we look up the table in
+    // sessionCatalog. Otherwise we try `sqlParser.parseMultipartIdentifier` to have a sequence of
+    // string as the qualified identifier and resolve the table through SQL analyzer.
+    try {
+      val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
+      listColumns(tableIdent)

Review Comment:
   I think we only need one listColumns? 
   
   If tableIdent is in SessionCatalog, then we convert it `spark_catalog.${tableIdent.database}.${tableIdent.name}` then all use `listColumns(ident: Seq[String])`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #37000: [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.

amaliujia commented on code in PR #37000:
URL: https://github.com/apache/spark/pull/37000#discussion_r907948285


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -208,8 +209,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    */
   @throws[AnalysisException]("table does not exist")
   override def listColumns(tableName: String): Dataset[Column] = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    listColumns(tableIdent)
+    // calling `sqlParser.parseTableIdentifier` to parse tableName. If it contains only table name
+    // and optionally contains a database name(thus a TableIdentifier), then we look up the table in
+    // sessionCatalog. Otherwise we try `sqlParser.parseMultipartIdentifier` to have a sequence of
+    // string as the qualified identifier and resolve the table through SQL analyzer.
+    try {
+      val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
+      listColumns(tableIdent)

Review Comment:
   Similarly as `getTable`: https://github.com/apache/spark/blob/b88cabb754073e69ce82fd561e6eac15046e633e/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L260



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #37000: [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.

zhengruifeng closed pull request #37000: [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace
URL: https://github.com/apache/spark/pull/37000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #37000: [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.

amaliujia commented on code in PR #37000:
URL: https://github.com/apache/spark/pull/37000#discussion_r907945624


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -208,8 +209,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    */
   @throws[AnalysisException]("table does not exist")
   override def listColumns(tableName: String): Dataset[Column] = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    listColumns(tableIdent)
+    // calling `sqlParser.parseTableIdentifier` to parse tableName. If it contains only table name
+    // and optionally contains a database name(thus a TableIdentifier), then we look up the table in
+    // sessionCatalog. Otherwise we try `sqlParser.parseMultipartIdentifier` to have a sequence of
+    // string as the qualified identifier and resolve the table through SQL analyzer.
+    try {
+      val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
+      listColumns(tableIdent)

Review Comment:
   I think we only need one listColumns? 
   
   If tableIdent is in SessionCatalog, then we convert it `spark_catalog.${tableIdent.database}.${tableIdent.name}`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #37000: [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.

amaliujia commented on code in PR #37000:
URL: https://github.com/apache/spark/pull/37000#discussion_r907955439


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -23,14 +23,15 @@ import scala.util.control.NonFatal
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalog.{Catalog, CatalogMetadata, Column, Database, Function, Table}
 import org.apache.spark.sql.catalyst.{DefinedByConstructorParams, FunctionIdentifier, TableIdentifier}
-import org.apache.spark.sql.catalyst.analysis.{ResolvedNamespace, ResolvedTable, ResolvedView, UnresolvedDBObjectName, UnresolvedNamespace, UnresolvedTable, UnresolvedTableOrView}
+import org.apache.spark.sql.catalyst.analysis._
 import org.apache.spark.sql.catalyst.catalog._
 import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
-import org.apache.spark.sql.catalyst.plans.logical.{CreateTable, LocalRelation, RecoverPartitions, ShowTables, SubqueryAlias, TableSpec, View}
+import org.apache.spark.sql.catalyst.plans.logical._

Review Comment:
   revert and same for others



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #37000: [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on PR #37000:
URL: https://github.com/apache/spark/pull/37000#issuecomment-1168094141

   cc @cloud-fan @amaliujia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #37000: [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on PR #37000:
URL: https://github.com/apache/spark/pull/37000#issuecomment-1169452560

   merged to master, thank you @cloud-fan @amaliujia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #37000: [SPARK-39615][SQL] Make listColumns be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.

amaliujia commented on code in PR #37000:
URL: https://github.com/apache/spark/pull/37000#discussion_r907945624


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -208,8 +209,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    */
   @throws[AnalysisException]("table does not exist")
   override def listColumns(tableName: String): Dataset[Column] = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    listColumns(tableIdent)
+    // calling `sqlParser.parseTableIdentifier` to parse tableName. If it contains only table name
+    // and optionally contains a database name(thus a TableIdentifier), then we look up the table in
+    // sessionCatalog. Otherwise we try `sqlParser.parseMultipartIdentifier` to have a sequence of
+    // string as the qualified identifier and resolve the table through SQL analyzer.
+    try {
+      val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
+      listColumns(tableIdent)

Review Comment:
   You can first check if tableIdent is in Session catalog by `tableExists`?
   
   If so fetch from SessionCatalog, otherwise go  `listColumns(ident)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org