You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/23 21:14:39 UTC

[GitHub] [spark] schuermannator opened a new pull request, #36969: [SPARK-39235][SQL] make setCurrentDatabase compatible with 3 layer namespace

schuermannator opened a new pull request, #36969:
URL: https://github.com/apache/spark/pull/36969

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Change `setCurrentDatabase` catalog API to support 3 layer namespace. If the database exists in the sessionCatalog, we set to that. Otherwise, parse the name as 3 layer name and use V2 catalog to check if it exists, then set current database.
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   `setCurrentDatabase` doesn't support 3 layer namespace.
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   Yes. This PR introduces a backwards-compatible API change to support 3 layer namespace (e.g. catalog.database.table) for `setCurrentDatabase`.
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36969: [SPARK-39235][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36969:
URL: https://github.com/apache/spark/pull/36969#issuecomment-1165043541

   cc @zhengruifeng FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36969: [SPARK-39235][SQL] make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36969:
URL: https://github.com/apache/spark/pull/36969#discussion_r905474007


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -43,30 +43,47 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
 
   private def sessionCatalog: SessionCatalog = sparkSession.sessionState.catalog
 
-  private def requireDatabaseExists(dbName: String): Unit = {
-    if (!sessionCatalog.databaseExists(dbName)) {
-      throw QueryCompilationErrors.databaseDoesNotExistError(dbName)
-    }
-  }
-
   private def requireTableExists(dbName: String, tableName: String): Unit = {
     if (!sessionCatalog.tableExists(TableIdentifier(tableName, Some(dbName)))) {
       throw QueryCompilationErrors.tableDoesNotExistInDatabaseError(tableName, dbName)
     }
   }
 
+  // FIXME
+  private var currentDatabaseV2: Option[String] = None
+
   /**
    * Returns the current default database in this session.
    */
-  override def currentDatabase: String = sessionCatalog.getCurrentDatabase
+  override def currentDatabase: String =
+    currentDatabaseV2.getOrElse(sessionCatalog.getCurrentDatabase)
 
   /**
    * Sets the current default database in this session.
    */
   @throws[AnalysisException]("database does not exist")
   override def setCurrentDatabase(dbName: String): Unit = {
-    requireDatabaseExists(dbName)
-    sessionCatalog.setCurrentDatabase(dbName)
+    // `dbName` could be either a single database name (behavior in Spark 3.3 and prior) or
+    // a qualified namespace with catalog name. We assume it's a single database name
+    // and check if we can find the dbName in sessionCatalog. If so, we set the use old
+    // sessionCatalog APIs to set current database. Otherwise, we try 3-part name parsing and
+    // check if the database exists with catalogV2 APIs
+    if (sessionCatalog.databaseExists(dbName)) {

Review Comment:
   Probably no need to do all of this.
   
   You can use https://github.com/apache/spark/blob/a3fdf3b107b62089890a4246c168ca516dff5d44/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala#L106.
   
   You can access the catalog manager by `sparkSession.sessionState.catalogManager`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #36969: [SPARK-39646][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #36969:
URL: https://github.com/apache/spark/pull/36969#issuecomment-1172796711

   Merged to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36969: [SPARK-39235][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36969:
URL: https://github.com/apache/spark/pull/36969#discussion_r908740631


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -58,15 +52,24 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
   /**
    * Returns the current default database in this session.
    */
-  override def currentDatabase: String = sessionCatalog.getCurrentDatabase
+  override def currentDatabase: String =
+    sparkSession.sessionState.catalogManager.currentNamespace.mkString(".")
 
   /**
    * Sets the current default database in this session.
    */
   @throws[AnalysisException]("database does not exist")
   override def setCurrentDatabase(dbName: String): Unit = {
-    requireDatabaseExists(dbName)
-    sessionCatalog.setCurrentDatabase(dbName)
+    val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)

Review Comment:
   oh so if we have current catalog `testcat` and the user calls `setCurrentDatabse(testcat.testdb)` then it will set the current database to a database named `testcat.testdb` in `testcat` catalog (so something like `testcat.testcat.db`)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36969: [SPARK-39646][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36969:
URL: https://github.com/apache/spark/pull/36969#discussion_r911370052


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -58,15 +52,24 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
   /**
    * Returns the current default database in this session.
    */
-  override def currentDatabase: String = sessionCatalog.getCurrentDatabase
+  override def currentDatabase: String =
+    sparkSession.sessionState.catalogManager.currentNamespace.mkString(".")
 
   /**
    * Sets the current default database in this session.
    */
   @throws[AnalysisException]("database does not exist")
   override def setCurrentDatabase(dbName: String): Unit = {
-    requireDatabaseExists(dbName)
-    sessionCatalog.setCurrentDatabase(dbName)
+    val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)

Review Comment:
   talked offline - we will parse but not mess with catalog prefix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #36969: [SPARK-39646][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #36969:
URL: https://github.com/apache/spark/pull/36969#issuecomment-1171983274

   SparkR failed:
   ```
   ══ Failed ══════════════════════════════════════════════════════════════════════
   ── 1. Failure (test_sparkSQL.R:4017:3): catalog APIs, currentDatabase, setCurren
   `setCurrentDatabase("zxwtyswklpf")` threw an error with unexpected message.
   Expected match: "Error in setCurrentDatabase : analysis error - Namespace 'zxwtyswklpf' does not exist"
   Actual message: "Error in setCurrentDatabase : no such database - Database 'zxwtyswklpf' not found"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36969: [SPARK-39235][SQL] make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36969:
URL: https://github.com/apache/spark/pull/36969#discussion_r905484781


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -743,4 +743,35 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     val catalogName2 = "catalog_not_exists"
     assert(!spark.catalog.databaseExists(Array(catalogName2, dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - current database") {

Review Comment:
   hmm I think you can also update `currentDatabase` API and the test will be simplified:
   
   `setCurrentNamespace` then check `currentNamespace`
   
   https://github.com/databricks/runtime/blob/ca9cfac39f6ca160091cc3a1d7727ee0f784d044/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L67



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #36969: [SPARK-39646][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on PR #36969:
URL: https://github.com/apache/spark/pull/36969#issuecomment-1171633000

   R: @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #36969: [SPARK-39235][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #36969:
URL: https://github.com/apache/spark/pull/36969#issuecomment-1166154279

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #36969: [SPARK-39646][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #36969: [SPARK-39646][SQL] Make setCurrentDatabase compatible with 3 layer namespace
URL: https://github.com/apache/spark/pull/36969


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36969: [SPARK-39235][SQL] make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36969:
URL: https://github.com/apache/spark/pull/36969#discussion_r905486093


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -743,4 +743,35 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     val catalogName2 = "catalog_not_exists"
     assert(!spark.catalog.databaseExists(Array(catalogName2, dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - current database") {

Review Comment:
   and catalog manager has a `currentNamespace`: https://github.com/apache/spark/blob/a3fdf3b107b62089890a4246c168ca516dff5d44/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala#L92.
   
   So for changing `currentDatabase`, it could be as easy as calling the catalog manager `currentNamespace` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36969: [SPARK-39235][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36969:
URL: https://github.com/apache/spark/pull/36969#discussion_r908742024


##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -749,4 +749,51 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     assert(spark.catalog.currentCatalog().equals("spark_catalog"))
     assert(spark.catalog.listCatalogs().collect().map(c => c.name).toSet == Set("testcat"))
   }
+
+  test("three layer namespace compatibility - current database") {
+    sql("CREATE NAMESPACE testcat.my_db")
+    // current catalog is spark_catalog, setting current databse to one in testcat will fail
+    val hiveErr = intercept[AnalysisException] {
+      spark.catalog.setCurrentDatabase("testcat.my_db")
+    }.getMessage
+    assert(hiveErr.contains("Namespace 'testcat.my_db' not found"))
+
+    // now switch catalogs and try again
+    // TODO? sql("USE CATALOG testcat")

Review Comment:
   I will follow up offline. I'm unsure why `USE CATALOG` wasn't working



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #36969: [SPARK-39646][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #36969:
URL: https://github.com/apache/spark/pull/36969#issuecomment-1171844014

   @schuermannator  there are some conflicts to reslove.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36969: [SPARK-39235][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36969:
URL: https://github.com/apache/spark/pull/36969#discussion_r908743436


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -58,15 +52,24 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
   /**
    * Returns the current default database in this session.
    */
-  override def currentDatabase: String = sessionCatalog.getCurrentDatabase
+  override def currentDatabase: String =
+    sparkSession.sessionState.catalogManager.currentNamespace.mkString(".")
 
   /**
    * Sets the current default database in this session.
    */
   @throws[AnalysisException]("database does not exist")
   override def setCurrentDatabase(dbName: String): Unit = {
-    requireDatabaseExists(dbName)
-    sessionCatalog.setCurrentDatabase(dbName)
+    val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+    // ident may or may not include catalog prefix
+    // if it includes the current catalog, strip it. otherwise consider it all to be namespace
+    val catalog = currentCatalog()
+    val ns = ident.headOption match {
+      // TODO case-sensitivity?

Review Comment:
   yea okay I will remove since we have decided not to care about catalog name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on a diff in pull request #36969: [SPARK-39235][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on code in PR #36969:
URL: https://github.com/apache/spark/pull/36969#discussion_r908743054


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -43,12 +43,6 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
 
   private def sessionCatalog: SessionCatalog = sparkSession.sessionState.catalog
 
-  private def requireDatabaseExists(dbName: String): Unit = {

Review Comment:
   since it is only used in two places that will both need to change its behavior I went ahead and inline here. I can revert since I'm no longer doing `listFunctions` (the second call site)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] schuermannator commented on pull request #36969: [SPARK-39646][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
schuermannator commented on PR #36969:
URL: https://github.com/apache/spark/pull/36969#issuecomment-1172686923

   good to go now!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #36969: [SPARK-39235][SQL] Make setCurrentDatabase compatible with 3 layer namespace

Posted by GitBox <gi...@apache.org>.
amaliujia commented on code in PR #36969:
URL: https://github.com/apache/spark/pull/36969#discussion_r908074187


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -43,12 +43,6 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
 
   private def sessionCatalog: SessionCatalog = sparkSession.sessionState.catalog
 
-  private def requireDatabaseExists(dbName: String): Unit = {

Review Comment:
   why inline this function?



##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -58,15 +52,24 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
   /**
    * Returns the current default database in this session.
    */
-  override def currentDatabase: String = sessionCatalog.getCurrentDatabase
+  override def currentDatabase: String =
+    sparkSession.sessionState.catalogManager.currentNamespace.mkString(".")
 
   /**
    * Sets the current default database in this session.
    */
   @throws[AnalysisException]("database does not exist")
   override def setCurrentDatabase(dbName: String): Unit = {
-    requireDatabaseExists(dbName)
-    sessionCatalog.setCurrentDatabase(dbName)
+    val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)

Review Comment:
   No need to parse. Just use the input as DB name directly.
   
   Basically if we say this function should respect `currentCatalog`, then we don't expect users to pass in catalog name. What users should provide are only DB names and we don't treat the input special. 



##########
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##########
@@ -58,15 +52,24 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {
   /**
    * Returns the current default database in this session.
    */
-  override def currentDatabase: String = sessionCatalog.getCurrentDatabase
+  override def currentDatabase: String =
+    sparkSession.sessionState.catalogManager.currentNamespace.mkString(".")
 
   /**
    * Sets the current default database in this session.
    */
   @throws[AnalysisException]("database does not exist")
   override def setCurrentDatabase(dbName: String): Unit = {
-    requireDatabaseExists(dbName)
-    sessionCatalog.setCurrentDatabase(dbName)
+    val ident = sparkSession.sessionState.sqlParser.parseMultipartIdentifier(dbName)
+    // ident may or may not include catalog prefix
+    // if it includes the current catalog, strip it. otherwise consider it all to be namespace
+    val catalog = currentCatalog()
+    val ns = ident.headOption match {
+      // TODO case-sensitivity?

Review Comment:
   I think we can be case insensitive as Hive metastore behaves this?
   
   Though I guess we don't need to check about the catalog name.



##########
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##########
@@ -749,4 +749,51 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest with BeforeAndAf
     assert(spark.catalog.currentCatalog().equals("spark_catalog"))
     assert(spark.catalog.listCatalogs().collect().map(c => c.name).toSet == Set("testcat"))
   }
+
+  test("three layer namespace compatibility - current database") {
+    sql("CREATE NAMESPACE testcat.my_db")
+    // current catalog is spark_catalog, setting current databse to one in testcat will fail
+    val hiveErr = intercept[AnalysisException] {
+      spark.catalog.setCurrentDatabase("testcat.my_db")
+    }.getMessage
+    assert(hiveErr.contains("Namespace 'testcat.my_db' not found"))
+
+    // now switch catalogs and try again
+    // TODO? sql("USE CATALOG testcat")

Review Comment:
   remove to do?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org