You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/23 21:50:33 UTC

[GitHub] [spark] amaliujia opened a new pull request, #36641: [SPARK-39263] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

amaliujia opened a new pull request, #36641:
URL: https://github.com/apache/spark/pull/36641

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   This is a part of effort to make catalog API be compatible with 3 layer namespace
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   Yes. The API change here is backward compatible and it extends the API to further support 3 layer namespace (e.g. catalog.database.table).
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r895846403


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,14 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    try {
+      val ident = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
+      getTable(ident.database.orNull, ident.table)
+    } catch {
+      case e: org.apache.spark.sql.catalyst.parser.ParseException =>
+        val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(tableName)

Review Comment:
   will `parseMultipartIdentifier` throw `ParseException`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r896161200


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,14 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    try {
+      val ident = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
+      getTable(ident.database.orNull, ident.table)
+    } catch {
+      case e: org.apache.spark.sql.catalyst.parser.ParseException =>
+        val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(tableName)

Review Comment:
   That arbitrary sequence of identifier will be passed to SQL analyzer to resolve the table. If user for example give `a.b.c.d.e.f`, the analyzer will just say this is not found.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r897682642


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -287,16 +298,37 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * Checks if the database with the specified name exists.
    */
   override def databaseExists(dbName: String): Boolean = {
-    sessionCatalog.databaseExists(dbName)
+    // To maintain backwards compatibility, we first treat the input is a simple dbName and check
+    // if sessionCatalog contains it. If no, we try to parse it as 3 part name. If the parased
+    // identifier contains both catalog name and database name, we then search the database in the
+    // catalog.
+    if (!sessionCatalog.databaseExists(dbName)) {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = sparkSession.sessionState.executePlan(UnresolvedNamespace(ident)).analyzed
+      plan match {
+        case ResolvedNamespace(catalog: InMemoryCatalog, _) => catalog.databaseExists(ident(1))
+        case _ => false
+      }

Review Comment:
   If the `CatalogPlugin` is an instance of `SupportsNamespaces`, then we can call its `namespaceExists` API. Otherwise, we can say the namespace always exists.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r896166516


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -681,4 +681,60 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     assert(spark.catalog.listTables("default").collect().map(_.name).toSet ==
       Set("my_table1", "my_table2", "my_temp_table"))
   }
+
+  test("three layer namespace compatibility - get table") {
+    val catalogName = "testcat"
+    val dbName = "my_db"
+    val tableName = "my_table"
+    val tableSchema = new StructType().add("i", "int")
+    val description = "this is a test table"
+
+    spark.catalog.createTable(
+      tableName = Array(catalogName, dbName, tableName).mkString("."),
+      source = classOf[FakeV2Provider].getName,
+      schema = tableSchema,
+      description = description,
+      options = Map.empty[String, String])
+
+    val t = spark.catalog.getTable(Array(catalogName, dbName, tableName).mkString("."))
+    val expectedTable =
+      new Table(
+        tableName,
+        catalogName,
+        Array(dbName),
+        description,
+        CatalogTableType.MANAGED.name,
+        false)
+    assert(expectedTable.toString == t.toString)
+  }
+
+  test("three layer namespace compatibility - table exists") {
+    val catalogName = "testcat"
+    val dbName = "my_db"
+    val tableName = "my_table"
+    val tableSchema = new StructType().add("i", "int")
+
+    assert(!spark.catalog.tableExists(Array(catalogName, dbName, tableName).mkString(".")))
+
+    spark.catalog.createTable(
+      tableName = Array(catalogName, dbName, tableName).mkString("."),
+      source = classOf[FakeV2Provider].getName,
+      schema = tableSchema,
+      description = "",
+      options = Map.empty[String, String])
+
+    assert(spark.catalog.tableExists(Array(catalogName, dbName, tableName).mkString(".")))
+  }
+
+  test("three layer namespace compatibility - database exists") {
+    val catalogName = "testcat"
+    val dbName = "my_db"
+    assert(!spark.catalog.databaseExists(Array(catalogName, dbName).mkString(".")))
+
+    val e = intercept[CatalogNotFoundException] {
+      val catalogName2 = "catalog_not_exists"
+      spark.catalog.databaseExists(Array(catalogName2, dbName).mkString("."))
+    }
+    assert(e.getMessage.contains("catalog_not_exists is not defined"))

Review Comment:
   done



##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,14 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    try {
+      val ident = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r897495588


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    // calling `sqlParser.parseTableIdentifier` to parse tableName. If it contains only table name
+    // and optionally contains a database name(thus a TableIdentifier), then that is used to get
+    // the table. Otherwise we try `sqlParser.parseMultipartIdentifier` to have a sequence of string

Review Comment:
   From my point of view:
   
   if the input can be parsed by `parseTableIdentifier`, it will be less than 3 names. In this case, we shall only look up in HMS, as the user not providing catalog name thus assuming this is the old behavior.
   
   And if input is 3 part name (or more for nested namespace), `parseTableIdentifier` will throw exception thus we try the 3 part name path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r896286734


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    // calling `sqlParser.parseTableIdentifier` to parse tableName. If it contains only table name
+    // and optionally contains a database name(thus a TableIdentifier), then that is used to get
+    // the table. Otherwise we try `sqlParser.parseMultipartIdentifier` to have a sequence of string

Review Comment:
   And then one idea is, if we can't find this table in HMS, shall we fall back to lookup the table with v2 code path?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r895850050


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,14 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    try {
+      val ident = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)

Review Comment:
   can we add some comments to explain the semantic here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r896163957


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -287,16 +294,44 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * Checks if the database with the specified name exists.
    */
   override def databaseExists(dbName: String): Boolean = {
-    sessionCatalog.databaseExists(dbName)
+    // To maintain backwards compatibility, we first treat the input is a simple dbName and check
+    // if sessionCatalog contains it. If no, we try to parse it as 3 part name. If the parased
+    // identifier contains both catalog name and database name, we then search the database in the
+    // catalog.
+    if (!sessionCatalog.databaseExists(dbName)) {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      if (ident.length == 2) {

Review Comment:
   hmm I don't know how to deal with nested namespace without the help of SQL analyzer. I am using `sparkSession.sessionState.catalogManager.catalog(catalog_name)` to check if a database exists.
   
   
   Maybe we should add a V2 command to handle nested namespace? If I handle by calling `catalogManager`, it seems not to be extensible?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r898293801


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    // calling `sqlParser.parseTableIdentifier` to parse tableName. If it contains only table name
+    // and optionally contains a database name(thus a TableIdentifier), then that is used to get
+    // the table. Otherwise we try `sqlParser.parseMultipartIdentifier` to have a sequence of string

Review Comment:
   This is the place that was also confusing to me:
   
   In the past there is only HMS and people call API based on that assumption, in that case does we think whether users know there is a current catalog concept? 
   
   This is vague to me: people may or may not use the API with current catalog in mind. For example, one user may not know that we can set a current catalog then play against it. One user might just always believe whatever they called, it will hit HMS. This matches with the old implementation I feel like because the code directly check sessionCatalog which is HMS.
   
   I am just trying to understand to maintain backwards compatibility,  what is that old behaviors.
   
   If you think the old behavior is one that we already respected current catalog. I can change this implementation: when there is 2 part namespace, we first get current catalog, then check within that catalog .
   
   However I am thinking the idea above might break backwards behavior:
   1. A HMS user set current catalog but not to HMS.
   2. The user uses 2 part name and and the user expects to read from HMS.
   
   In the past even the user does the step 1, step 2 will still work as we just verify, for example, a DB in sessionCatalog.
   
   With respecting current catalog, step 2 will not succeed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r896156559


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,14 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    try {
+      val ident = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
+      getTable(ident.database.orNull, ident.table)
+    } catch {
+      case e: org.apache.spark.sql.catalyst.parser.ParseException =>
+        val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(tableName)

Review Comment:
   No. For arbitrary sequence it can breaks that by dots to a Seq of string.
   
   `sparkSession.sessionState.sqlParser.parseTableIdentifie` will throw exception when the input is not `b` or `a.b` thus beyond a table identifier which only considers DB and table name in OSS. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r896286460


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    // calling `sqlParser.parseTableIdentifier` to parse tableName. If it contains only table name
+    // and optionally contains a database name(thus a TableIdentifier), then that is used to get
+    // the table. Otherwise we try `sqlParser.parseMultipartIdentifier` to have a sequence of string

Review Comment:
   let's be more specific about the semantics. If the table name has less than 3 parts, we always get the table from HMS first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #36641:
URL: https://github.com/apache/spark/pull/36641#issuecomment-1159209838

   @cloud-fan 
   
   comments addressed. Can you take another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r895848763


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -681,4 +681,60 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     assert(spark.catalog.listTables("default").collect().map(_.name).toSet ==
       Set("my_table1", "my_table2", "my_temp_table"))
   }
+
+  test("three layer namespace compatibility - get table") {
+    val catalogName = "testcat"
+    val dbName = "my_db"
+    val tableName = "my_table"
+    val tableSchema = new StructType().add("i", "int")
+    val description = "this is a test table"
+
+    spark.catalog.createTable(
+      tableName = Array(catalogName, dbName, tableName).mkString("."),
+      source = classOf[FakeV2Provider].getName,
+      schema = tableSchema,
+      description = description,
+      options = Map.empty[String, String])
+
+    val t = spark.catalog.getTable(Array(catalogName, dbName, tableName).mkString("."))
+    val expectedTable =
+      new Table(
+        tableName,
+        catalogName,
+        Array(dbName),
+        description,
+        CatalogTableType.MANAGED.name,
+        false)
+    assert(expectedTable.toString == t.toString)
+  }
+
+  test("three layer namespace compatibility - table exists") {
+    val catalogName = "testcat"
+    val dbName = "my_db"
+    val tableName = "my_table"
+    val tableSchema = new StructType().add("i", "int")
+
+    assert(!spark.catalog.tableExists(Array(catalogName, dbName, tableName).mkString(".")))
+
+    spark.catalog.createTable(
+      tableName = Array(catalogName, dbName, tableName).mkString("."),
+      source = classOf[FakeV2Provider].getName,
+      schema = tableSchema,
+      description = "",
+      options = Map.empty[String, String])
+
+    assert(spark.catalog.tableExists(Array(catalogName, dbName, tableName).mkString(".")))
+  }
+
+  test("three layer namespace compatibility - database exists") {
+    val catalogName = "testcat"
+    val dbName = "my_db"
+    assert(!spark.catalog.databaseExists(Array(catalogName, dbName).mkString(".")))
+
+    val e = intercept[CatalogNotFoundException] {
+      val catalogName2 = "catalog_not_exists"
+      spark.catalog.databaseExists(Array(catalogName2, dbName).mkString("."))
+    }
+    assert(e.getMessage.contains("catalog_not_exists is not defined"))

Review Comment:
   We can use SQL statement to create database in this test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r897502115


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -287,16 +298,37 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * Checks if the database with the specified name exists.
    */
   override def databaseExists(dbName: String): Boolean = {
-    sessionCatalog.databaseExists(dbName)
+    // To maintain backwards compatibility, we first treat the input is a simple dbName and check
+    // if sessionCatalog contains it. If no, we try to parse it as 3 part name. If the parased
+    // identifier contains both catalog name and database name, we then search the database in the
+    // catalog.
+    if (!sessionCatalog.databaseExists(dbName)) {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = sparkSession.sessionState.executePlan(UnresolvedNamespace(ident)).analyzed
+      plan match {
+        case ResolvedNamespace(catalog: InMemoryCatalog, _) => catalog.databaseExists(ident(1))
+        case _ => false
+      }

Review Comment:
   @cloud-fan 
   
   It seems that the `UnresolvedNamespace` will just be converted to `ResolvedNamespace` without a further validation: https://github.com/apache/spark/blob/7cfc40fff6a2bb4025b29d8dd9eb66734030a901/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L890
   
   So we cannot rely on whether the plan is resolved to determine if the DB exists. In current impl I have to access the catalog and verify it by myself. 
   
   However, `ResolvedNamespace`'s catalog's type is `CatalogPlugin`. I am not sure how to write general code to deal with it. Any ideas?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r896288305


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -287,16 +294,44 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * Checks if the database with the specified name exists.
    */
   override def databaseExists(dbName: String): Boolean = {
-    sessionCatalog.databaseExists(dbName)
+    // To maintain backwards compatibility, we first treat the input is a simple dbName and check
+    // if sessionCatalog contains it. If no, we try to parse it as 3 part name. If the parased
+    // identifier contains both catalog name and database name, we then search the database in the
+    // catalog.
+    if (!sessionCatalog.databaseExists(dbName)) {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      if (ident.length == 2) {

Review Comment:
   We can follow what we did in `makeTable`: Create a `UnresolvedNamespace` logical plan and resolve it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #36641: [SPARK-39263] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #36641:
URL: https://github.com/apache/spark/pull/36641#issuecomment-1135352493

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r897495588


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    // calling `sqlParser.parseTableIdentifier` to parse tableName. If it contains only table name
+    // and optionally contains a database name(thus a TableIdentifier), then that is used to get
+    // the table. Otherwise we try `sqlParser.parseMultipartIdentifier` to have a sequence of string

Review Comment:
   From my point of view:
   
   if the input can be parsed by `parseTableIdentifier`, it will be less than 3 names. In this case, we shall only look up in HMS, as the user not providing catalog name thus assuming this is the old behavior.
   
   And if input is 3 part name (or more for nested namespace), `parseTableIdentifier` will throw exception thus we try the 3 part name path.
   
   So my answer is we don't fall back to v2 code path if 1) the input is 2 part name 2) and even the input cannot be found from HMS.
   
   What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r895849585


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -287,16 +294,44 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * Checks if the database with the specified name exists.
    */
   override def databaseExists(dbName: String): Boolean = {
-    sessionCatalog.databaseExists(dbName)
+    // To maintain backwards compatibility, we first treat the input is a simple dbName and check
+    // if sessionCatalog contains it. If no, we try to parse it as 3 part name. If the parased
+    // identifier contains both catalog name and database name, we then search the database in the
+    // catalog.
+    if (!sessionCatalog.databaseExists(dbName)) {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      if (ident.length == 2) {

Review Comment:
   hmm shall we consider multi-level namespace like `cat.ns1.ns2`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r894880544


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -681,4 +681,60 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     assert(spark.catalog.listTables("default").collect().map(_.name).toSet ==
       Set("my_table1", "my_table2", "my_temp_table"))
   }
+
+  test("three layer namespace compatibility - get table") {
+    val catalogName = "testcat"
+    val dbName = "my_db"
+    val tableName = "my_table"
+    val tableSchema = new StructType().add("i", "int")
+    val description = "this is a test table"
+
+    spark.catalog.createTable(
+      tableName = Array(catalogName, dbName, tableName).mkString("."),
+      source = classOf[FakeV2Provider].getName,
+      schema = tableSchema,
+      description = description,
+      options = Map.empty[String, String])
+
+    val t = spark.catalog.getTable(Array(catalogName, dbName, tableName).mkString("."))
+    val expectedTable =
+      new Table(
+        tableName,
+        catalogName,
+        Array(dbName),
+        description,
+        CatalogTableType.MANAGED.name,
+        false)
+    assert(expectedTable.toString == t.toString)
+  }
+
+  test("three layer namespace compatibility - table exists") {
+    val catalogName = "testcat"
+    val dbName = "my_db"
+    val tableName = "my_table"
+    val tableSchema = new StructType().add("i", "int")
+
+    assert(!spark.catalog.tableExists(Array(catalogName, dbName, tableName).mkString(".")))
+
+    spark.catalog.createTable(
+      tableName = Array(catalogName, dbName, tableName).mkString("."),
+      source = classOf[FakeV2Provider].getName,
+      schema = tableSchema,
+      description = "",
+      options = Map.empty[String, String])
+
+    assert(spark.catalog.tableExists(Array(catalogName, dbName, tableName).mkString(".")))
+  }
+
+  test("three layer namespace compatibility - database exists") {
+    val catalogName = "testcat"
+    val dbName = "my_db"
+    assert(!spark.catalog.databaseExists(Array(catalogName, dbName).mkString(".")))
+
+    val e = intercept[CatalogNotFoundException] {
+      val catalogName2 = "catalog_not_exists"
+      spark.catalog.databaseExists(Array(catalogName2, dbName).mkString("."))
+    }
+    assert(e.getMessage.contains("catalog_not_exists is not defined"))

Review Comment:
   I didn't find a create namespace call in Catalog API so not sure how to create a DB and test positive case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r897689496


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -250,8 +251,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * table/view. This throws an `AnalysisException` when no `Table` can be found.
    */
   override def getTable(tableName: String): Table = {
-    val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-    getTable(tableIdent.database.orNull, tableIdent.table)
+    // calling `sqlParser.parseTableIdentifier` to parse tableName. If it contains only table name
+    // and optionally contains a database name(thus a TableIdentifier), then that is used to get
+    // the table. Otherwise we try `sqlParser.parseMultipartIdentifier` to have a sequence of string

Review Comment:
   TBH it's a bit weird that this scala API does not respect the current catalog, while SQL API does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace
URL: https://github.com/apache/spark/pull/36641


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #36641:
URL: https://github.com/apache/spark/pull/36641#issuecomment-1161188261

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36641:
URL: https://github.com/apache/spark/pull/36641#discussion_r894880106


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##########
@@ -966,7 +966,7 @@ class SessionCatalog(
   }
 
   def isGlobalTempViewDB(dbName: String): Boolean = {
-    globalTempViewManager.database.equals(dbName)
+    globalTempViewManager.database.equalsIgnoreCase(dbName)

Review Comment:
   This is a follow up change to https://github.com/apache/spark/pull/36586/files/2864b6ab46cde170c7b104d572b265cbc1e13687#r893029290



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #36641:
URL: https://github.com/apache/spark/pull/36641#issuecomment-1154465576

   @cloud-fan comments addressed and there is one that we can discuss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #36641: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #36641:
URL: https://github.com/apache/spark/pull/36641#issuecomment-1136409231

   cc @cloud-fan this PR is built on top of https://github.com/apache/spark/pull/36586, I will rebase after https://github.com/apache/spark/pull/36586 is merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org