You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/23 19:39:32 UTC

[GitHub] [spark] schuermannator opened a new pull request, #36968: [SPARK-39235][SQL] make getDatabase compatible with 3 layer namespace

schuermannator opened a new pull request, #36968:
URL: https://github.com/apache/spark/pull/36968

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Change `getDatabase` catalog API to support 3 layer namespace. If the database exists in the sessionCatalog, we return that. Otherwise, parse the name as 3 layer name and use V2 catalog.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   `getDatabase` doesn't support 3 layer namespace.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   Yes. This PR introduces a backwards-compatible API change to support 3 layer namespace (e.g. `catalog.database.table`).
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #36968:
URL: https://github.com/apache/spark/pull/36968#issuecomment-1168277746

   @cloud-fan the test failure is being checked and fixed. But I think major comments were addressed already.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #36968: [SPARK-39645][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #36968:
URL: https://github.com/apache/spark/pull/36968#issuecomment-1171835086

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #36968: [SPARK-39645][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #36968:
URL: https://github.com/apache/spark/pull/36968#issuecomment-1171612901

   @cloud-fan tests have passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r907641599


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -743,4 +743,25 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     val catalogName2 = "catalog_not_exists"
     assert(!spark.catalog.databaseExists(Array(catalogName2, dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - get database") {
+    Seq(("testcat", "somedb"), ("testcat", "ns.somedb")).foreach { case (catalog, dbName) =>
+      val qualifiedDb = s"$catalog.$dbName"
+      // TODO test properties? WITH DBPROPERTIES (prop='val')
+      sql(s"CREATE NAMESPACE $qualifiedDb COMMENT 'test comment' LOCATION '/test/location'")
+      val db = spark.catalog.getDatabase(qualifiedDb)
+      assert(db.name === dbName)
+      assert(db.description === "test comment")
+      assert(db.locationUri === "file:/test/location")
+    }
+    intercept[AnalysisException](spark.catalog.getDatabase("randomcat.db10"))
+  }
+
+  test("get database when there is `default` catalog") {
+    spark.conf.set("spark.sql.catalog.default", classOf[InMemoryCatalog].getName)
+    val db = "testdb"
+    val qualified = s"default.$db"
+    sql(s"CREATE NAMESPACE $qualified")

Review Comment:
   I've implemented this test but currently just verifying based on setting the 'comment' of the Database. Let me know if there is some `Database.catalog` API i am not aware of



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #36968: [SPARK-39235][SQL] make getDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #36968:
URL: https://github.com/apache/spark/pull/36968#issuecomment-1164853848

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r905774631


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -743,4 +743,25 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     val catalogName2 = "catalog_not_exists"
     assert(!spark.catalog.databaseExists(Array(catalogName2, dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - get database") {
+    Seq(("testcat", "somedb"), ("testcat", "ns.somedb")).foreach { case (catalog, dbName) =>
+      val qualifiedDb = s"$catalog.$dbName"
+      // TODO test properties? WITH DBPROPERTIES (prop='val')

Review Comment:
   Shall we remove this TODO? it seems to be done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #36968: [SPARK-39235][SQL] Make getDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #36968:
URL: https://github.com/apache/spark/pull/36968#issuecomment-1165955298

   @schuermannator 
   
   Could you BTW change the `listDatabases` behavior in this PR? I just realized it is not correct given the changes that we are making. It is better to be done by your PR because it will rely on your current `getDatabase` change.
   
   
   It could be as the following code 
   ```
     override def listDatabases(): Dataset[Database] = {
       val catalog = currentCatalog()
       val plan = ShowNamespaces(UnresolvedNamespace(Seq(catalog)), None)
       val ret = sparkSession.sessionState.executePlan(plan).toRdd.collect()
       val databases = ret
         .map(row => catalog + "." + row.getString(0))
         .map(getDatabase)
       CatalogImpl.makeDataset(databases, sparkSession)
     }
   ```
   
   You can use this code to test it:
   ```
     test("list databases with current catalog") {
       spark.catalog.setCurrentCatalog("testcat")
       sql(s"CREATE NAMESPACE testcat.my_db")
       sql(s"CREATE NAMESPACE testcat.my_db2")
       assert(spark.catalog.listDatabases().collect().map(_.name).toSet == Set("my_db", "my_db2"))
     }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r909217647


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +247,30 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          val metadata = catalog.loadNamespaceMetadata(db.toArray)
+          new Database(

Review Comment:
   added! I went ahead and pushed what i have but other tests will likely break and i will revisit tomorrow



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r907636962


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -743,4 +743,25 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     val catalogName2 = "catalog_not_exists"
     assert(!spark.catalog.databaseExists(Array(catalogName2, dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - get database") {
+    Seq(("testcat", "somedb"), ("testcat", "ns.somedb")).foreach { case (catalog, dbName) =>
+      val qualifiedDb = s"$catalog.$dbName"
+      // TODO test properties? WITH DBPROPERTIES (prop='val')

Review Comment:
   If there is no properties returned in the `Database` object, then it is ok to skip such properties test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r908107238


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +247,30 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          val metadata = catalog.loadNamespaceMetadata(db.toArray)
+          new Database(

Review Comment:
   shall we add a new `catalog: String` field to `Database`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r908791893


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +247,30 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          val metadata = catalog.loadNamespaceMetadata(db.toArray)

Review Comment:
   Big +1 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #36968: [SPARK-39645][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #36968: [SPARK-39645][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace
URL: https://github.com/apache/spark/pull/36968


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r910594525


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -296,7 +300,36 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, namespace) =>
+          val metadata = catalog.loadNamespaceMetadata(namespace.toArray)
+          new Database(
+            name = namespace.mkString("."),
+            catalog = catalog.name,
+            description = metadata.get(SupportsNamespaces.PROP_COMMENT),
+            locationUri = metadata.get(SupportsNamespaces.PROP_LOCATION))
+        // similar to databaseExists: if the catalog doesn't support namespaces, we assume it's an
+        // implicit namespace, which exists but has no metadata.
+        case ResolvedNamespace(catalog: CatalogPlugin, namespace) =>
+          new Database(
+            name = dbName,

Review Comment:
   doing `namespace.quoted` same as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36968: [SPARK-39235][SQL] Make getDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36968:
URL: https://github.com/apache/spark/pull/36968#issuecomment-1165043500

   cc @zhengruifeng FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #36968: [SPARK-39235][SQL] Make getDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #36968:
URL: https://github.com/apache/spark/pull/36968#issuecomment-1165199122

   R: @cloud-fan can you take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r909167528


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +247,30 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          val metadata = catalog.loadNamespaceMetadata(db.toArray)
+          new Database(
+            name = db.mkString("."),

Review Comment:
   nit: ResolvedNamespace has a `namespace`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r905774400


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +243,29 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      val metadata = resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          catalog.loadNamespaceMetadata(db.toArray)
+        // TODO what to do if it doesn't support namespaces
+        case _ => throw new RuntimeException(s"unexpected catalog resolved: $resolved")
+      }
+      new Database(
+        name = db.mkString("."),
+        description = metadata.get("comment"),

Review Comment:
   let's not hardcode it, use `SupportsNamespaces.PROP_COMMENT` instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r910528924


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -296,7 +300,36 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, namespace) =>
+          val metadata = catalog.loadNamespaceMetadata(namespace.toArray)
+          new Database(
+            name = namespace.mkString("."),
+            catalog = catalog.name,
+            description = metadata.get(SupportsNamespaces.PROP_COMMENT),
+            locationUri = metadata.get(SupportsNamespaces.PROP_LOCATION))
+        // similar to databaseExists: if the catalog doesn't support namespaces, we assume it's an
+        // implicit namespace, which exists but has no metadata.
+        case ResolvedNamespace(catalog: CatalogPlugin, namespace) =>
+          new Database(
+            name = dbName,

Review Comment:
   let's generate database name from `namespace`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #36968: [SPARK-39645][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #36968:
URL: https://github.com/apache/spark/pull/36968#issuecomment-1171871432

   @zhengruifeng this one is merged and Python side is unblocked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #36968:
URL: https://github.com/apache/spark/pull/36968#issuecomment-1170687902

   @schuermannator Would you mind to adding two subtasks under [umbrella](https://issues.apache.org/jira/browse/SPARK-39235) and link the the two PRs(this one and https://github.com/apache/spark/pull/36969) to new subtasks?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r909218016


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +247,30 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          val metadata = catalog.loadNamespaceMetadata(db.toArray)
+          new Database(
+            name = db.mkString("."),

Review Comment:
   yep using that now :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36968: [SPARK-39235][SQL] make getDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r905419637


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -743,4 +743,25 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     val catalogName2 = "catalog_not_exists"
     assert(!spark.catalog.databaseExists(Array(catalogName2, dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - get database") {
+    Seq(("testcat", "somedb"), ("testcat", "ns.somedb")).foreach { case (catalog, dbName) =>

Review Comment:
   how about adding another case of ("spark_catalog", "somedb")?



##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -743,4 +743,25 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     val catalogName2 = "catalog_not_exists"
     assert(!spark.catalog.databaseExists(Array(catalogName2, dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - get database") {
+    Seq(("testcat", "somedb"), ("testcat", "ns.somedb")).foreach { case (catalog, dbName) =>
+      val qualifiedDb = s"$catalog.$dbName"
+      // TODO test properties? WITH DBPROPERTIES (prop='val')

Review Comment:
   +1 to test the complete namespace metadata.



##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +243,29 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      val metadata = resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          catalog.loadNamespaceMetadata(db.toArray)
+        // TODO what to do if it doesn't support namespaces
+        case _ => throw new RuntimeException(s"unexpected catalog resolved: $resolved")

Review Comment:
   cc @cloud-fan 
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r907780722


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -749,4 +756,39 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     assert(spark.catalog.currentCatalog().equals("spark_catalog"))
     assert(spark.catalog.listCatalogs().collect().map(c => c.name).toSet == Set("testcat"))
   }
+
+  test("three layer namespace compatibility - get database") {
+    val catalogsAndDatabases =
+      Seq(("testcat", "somedb"), ("testcat", "ns.somedb"), ("spark_catalog", "somedb"))
+    catalogsAndDatabases.foreach { case (catalog, dbName) =>
+      val qualifiedDb = s"$catalog.$dbName"
+      sql(s"CREATE NAMESPACE $qualifiedDb COMMENT 'test comment' LOCATION '/test/location'")
+      val db = spark.catalog.getDatabase(qualifiedDb)
+      assert(db.name === dbName)
+      assert(db.description === "test comment")
+      assert(db.locationUri === "file:/test/location")
+    }
+    intercept[AnalysisException](spark.catalog.getDatabase("randomcat.db10"))
+  }
+
+  test("three layer namespace compatibility - get database, same in hive and testcat") {
+    // create 'testdb' in hive and testcat
+    val dbName = "testdb"
+    sql(s"CREATE NAMESPACE spark_catalog.$dbName COMMENT 'hive database'")
+    sql(s"CREATE NAMESPACE testcat.$dbName COMMENT 'testcat namespace'")
+    sql("SET CATALOG testcat")
+    // should still return the database in Hive
+    val db = spark.catalog.getDatabase(dbName)
+    assert(db.name === dbName)
+    assert(db.description === "hive database")
+    // TODO catalog check API?

Review Comment:
   removing comment then!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r907680192


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -749,4 +756,39 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     assert(spark.catalog.currentCatalog().equals("spark_catalog"))
     assert(spark.catalog.listCatalogs().collect().map(c => c.name).toSet == Set("testcat"))
   }
+
+  test("three layer namespace compatibility - get database") {
+    val catalogsAndDatabases =
+      Seq(("testcat", "somedb"), ("testcat", "ns.somedb"), ("spark_catalog", "somedb"))
+    catalogsAndDatabases.foreach { case (catalog, dbName) =>
+      val qualifiedDb = s"$catalog.$dbName"
+      sql(s"CREATE NAMESPACE $qualifiedDb COMMENT 'test comment' LOCATION '/test/location'")
+      val db = spark.catalog.getDatabase(qualifiedDb)
+      assert(db.name === dbName)
+      assert(db.description === "test comment")
+      assert(db.locationUri === "file:/test/location")
+    }
+    intercept[AnalysisException](spark.catalog.getDatabase("randomcat.db10"))
+  }
+
+  test("three layer namespace compatibility - get database, same in hive and testcat") {
+    // create 'testdb' in hive and testcat
+    val dbName = "testdb"
+    sql(s"CREATE NAMESPACE spark_catalog.$dbName COMMENT 'hive database'")
+    sql(s"CREATE NAMESPACE testcat.$dbName COMMENT 'testcat namespace'")
+    sql("SET CATALOG testcat")
+    // should still return the database in Hive
+    val db = spark.catalog.getDatabase(dbName)
+    assert(db.name === dbName)
+    assert(db.description === "hive database")
+    // TODO catalog check API?

Review Comment:
   There is no `Catalog` field from the Database object so this comment check seems to be good. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r908549964


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +247,30 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          val metadata = catalog.loadNamespaceMetadata(db.toArray)

Review Comment:
   good point. i will also make a test for the case without a catalog name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r907669226


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -743,4 +743,25 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     val catalogName2 = "catalog_not_exists"
     assert(!spark.catalog.databaseExists(Array(catalogName2, dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - get database") {
+    Seq(("testcat", "somedb"), ("testcat", "ns.somedb")).foreach { case (catalog, dbName) =>
+      val qualifiedDb = s"$catalog.$dbName"
+      // TODO test properties? WITH DBPROPERTIES (prop='val')

Review Comment:
   correct, i will skip



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r910528646


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -296,7 +300,36 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, namespace) =>
+          val metadata = catalog.loadNamespaceMetadata(namespace.toArray)
+          new Database(
+            name = namespace.mkString("."),

Review Comment:
   let's use `MultipartIdentifierHelper.quoted`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r908791594


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +247,30 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          val metadata = catalog.loadNamespaceMetadata(db.toArray)
+          new Database(

Review Comment:
   We have done this to add `catalog` to `Table` so apply the same idea to Database should be ok: https://github.com/apache/spark/blob/458f8a7bd9c94e249bc094f095090651cabbd535/sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala#L87
   
   
   The only thing I can forecast is it could cause a bit more test failures thus need to fix those (as the Database case class constructor will be changed).
   
   We can maintain the backward compatibility by remaining the old constructor, like:  https://github.com/apache/spark/blob/458f8a7bd9c94e249bc094f095090651cabbd535/sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala#L94
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r908109474


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -749,4 +756,38 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     assert(spark.catalog.currentCatalog().equals("spark_catalog"))
     assert(spark.catalog.listCatalogs().collect().map(c => c.name).toSet == Set("testcat"))
   }
+
+  test("three layer namespace compatibility - get database") {
+    val catalogsAndDatabases =
+      Seq(("testcat", "somedb"), ("testcat", "ns.somedb"), ("spark_catalog", "somedb"))
+    catalogsAndDatabases.foreach { case (catalog, dbName) =>
+      val qualifiedDb = s"$catalog.$dbName"
+      sql(s"CREATE NAMESPACE $qualifiedDb COMMENT 'test comment' LOCATION '/test/location'")

Review Comment:
   nit: can we put `qualifiedDb` into the comment? otherwise the test still passes even if we resolve to the wrong database.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r908548809


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +247,30 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          val metadata = catalog.loadNamespaceMetadata(db.toArray)
+          new Database(

Review Comment:
   cc @amaliujia. this seems reasonable, maybe we should discuss? i suppose augmenting the type wouldn't lead to any backwards-compat issues?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase and listDatabases compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r908106214


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +247,30 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          val metadata = catalog.loadNamespaceMetadata(db.toArray)

Review Comment:
   `db` assumes the first name part is always the catalog name, which is not always true. We should use `ResolvedNamespace.namespace`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r905773687


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -243,7 +243,29 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
    * `Database` can be found.
    */
   override def getDatabase(dbName: String): Database = {
-    makeDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or a
+    // qualified namespace with catalog name. To maintain backwards compatibility, we first assume
+    // it's a single database name and return the database from sessionCatalog if it exists.
+    // Otherwise we try 3-part name parsing and locate the database. If the parased identifier
+    // contains both catalog name and database name, we then search the database in the catalog.
+    if (sessionCatalog.databaseExists(dbName) || sessionCatalog.isGlobalTempViewDB(dbName)) {
+      makeDatabase(dbName)
+    } else {
+      val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+      val plan = UnresolvedNamespace(ident)
+      val resolved = sparkSession.sessionState.executePlan(plan).analyzed
+      val db = ident.tail
+      val metadata = resolved match {
+        case ResolvedNamespace(catalog: SupportsNamespaces, _) =>
+          catalog.loadNamespaceMetadata(db.toArray)
+        // TODO what to do if it doesn't support namespaces
+        case _ => throw new RuntimeException(s"unexpected catalog resolved: $resolved")

Review Comment:
   Let's follow `databaseExists`: if the catalog doesn't support namespace, we assume it's an implicit namespace, which exists but has no metadata.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36968: [SPARK-39235][SQL] Make getDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36968:
URL: https://github.com/apache/spark/pull/36968#discussion_r905777243


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -743,4 +743,25 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     val catalogName2 = "catalog_not_exists"
     assert(!spark.catalog.databaseExists(Array(catalogName2, dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - get database") {
+    Seq(("testcat", "somedb"), ("testcat", "ns.somedb")).foreach { case (catalog, dbName) =>
+      val qualifiedDb = s"$catalog.$dbName"
+      // TODO test properties? WITH DBPROPERTIES (prop='val')
+      sql(s"CREATE NAMESPACE $qualifiedDb COMMENT 'test comment' LOCATION '/test/location'")
+      val db = spark.catalog.getDatabase(qualifiedDb)
+      assert(db.name === dbName)
+      assert(db.description === "test comment")
+      assert(db.locationUri === "file:/test/location")
+    }
+    intercept[AnalysisException](spark.catalog.getDatabase("randomcat.db10"))
+  }
+
+  test("get database when there is `default` catalog") {
+    spark.conf.set("spark.sql.catalog.default", classOf[InMemoryCatalog].getName)
+    val db = "testdb"
+    val qualified = s"default.$db"
+    sql(s"CREATE NAMESPACE $qualified")

Review Comment:
   I think what we should test is: there is a database called `testdb` in Hive Metastore, and also a namespace called `testdb` in `testcat`. The current catalog is `testcat` (we can change it via `sql("SET CATALOG testcat")`), and `getDatabase` should return the database in Hive Metastore. We can check the catalog by `Database.catalog`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org