You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "panbingkun (via GitHub)" <gi...@apache.org> on 2023/11/10 09:04:24 UTC

[PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

panbingkun opened a new pull request, #43751:
URL: https://github.com/apache/spark/pull/43751

…at takes a pattern string for v2 catalog

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505263771


##########
sql/core/src/test/resources/sql-tests/analyzer-results/show-views.sql.out:
##########
@@ -77,31 +77,31 @@ ShowViewsCommand global_temp, [namespace#x, viewName#x, isTemporary#x]
 
 
 -- !query
-SHOW VIEWS 'view_*'
+SHOW VIEWS 'view_%'
 -- !query analysis
-ShowViewsCommand showdb, view_*, [namespace#x, viewName#x, isTemporary#x]
+ShowViewsCommand showdb, view_%, [namespace#x, viewName#x, isTemporary#x]
 
 
 -- !query
-SHOW VIEWS LIKE 'view_1*|view_2*'

Review Comment:
   The OR syntax represented by | is no longer supported by default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505586754


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala:
##########
@@ -107,27 +107,82 @@ object StringUtils extends Logging {
   def isFalseString(s: UTF8String): Boolean = falseStrings.contains(s.trimAll().toLowerCase)
   // scalastyle:on caselocale
 
+  def getAllMatchWildcard: String = {
+    if (SQLConf.get.legacyUseStarAndVerticalBarAsWildcardsInLikePattern) {
+      "*"
+    } else {
+      "%"
+    }
+  }
+
+  def filterPattern(names: Seq[String], pattern: String): Seq[String] = {
+    if (SQLConf.get.legacyUseStarAndVerticalBarAsWildcardsInLikePattern) {
+      filterPatternLegacy(names, pattern)
+    } else {
+      filterBySQLLikePattern(names, pattern)
+    }
+  }
+
   /**
-   * This utility can be used for filtering pattern in the "Like" of "Show Tables / Functions" DDL
+   * This legacy utility can be used for filtering pattern in the "Like" of
+   * "Show Tables / Functions" DDL.
    * @param names the names list to be filtered
    * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will
    *                follow regular expression convention, case insensitive match and white spaces
    *                on both ends will be ignored
    * @return the filtered names list in order
    */
-  def filterPattern(names: Seq[String], pattern: String): Seq[String] = {
+  def filterPatternLegacy(names: Seq[String], pattern: String): Seq[String] = {

Review Comment:
   Only rename `XXX` to `legacyXXX`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505232316


##########
docs/sql-ref-syntax-aux-show-tables.md:
##########
@@ -40,12 +40,18 @@ SHOW TABLES [ { FROM | IN } database_name ] [ LIKE regex_pattern ]
 
 * **regex_pattern**
 
-     Specifies the regular expression pattern that is used to filter out unwanted tables. 
+     Specifies the regular expression pattern that is used to filter out unwanted tables.

Review Comment:
   Before:
   <img width="914" alt="image" src="https://github.com/apache/spark/assets/15246973/361b73f5-2d0e-4a80-a30a-7b82ef5a8577">
   
   After:
   <img width="915" alt="image" src="https://github.com/apache/spark/assets/15246973/43deb295-6dc6-4370-89cd-6bf1b73d1c80">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1495149980


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##########
@@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin {
    */
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * <p>
+   * If the catalog supports views, this must return identifiers for only tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will
+   *                follow regular expression convention, case-insensitive match and white spaces
+   *                on both ends will be ignored

Review Comment:
   I have looked at the document `https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-tables.html#parameters`(SHOW TABLES doc page) and found that the parameter `regex_pattern` in it explains the `pattern`.
   <img width="947" alt="image" src="https://github.com/apache/spark/assets/15246973/6349db2d-825e-4031-8f3e-4c984673f962">
   Thank you very much for your reminder, Let's refer to it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1968709145

   > This is a hard decision. Technically the behavior of LIKE in many commands (`SHOW TABLES LIKE ...`) relies on the underlying catalog, which can be HMS of different versions, or a Hive-compatible metastore service. This is out of Spark's control.
   > 
   > From @panbingkun's [investigation](https://github.com/apache/spark/pull/43751#issuecomment-1953619718), the Hive behavior is actually very weird and different from other main-stream SQL systems (they follow the same behavior of the LIKE expression). Hive 4.0 also switches to the more common behavior.
   > 
   > There are some commands that we implement the LIKE filtering by our own, following the Hive behavior. Now we are in a hard position:
   > 
   > 1. If we do nothing, then Spark's behavior of LIKE in commands is non-standard and different from other databases. We may also hit future behavior changes if we upgrade to Hive 4.0.
   > 2. If we change the LIKE filtering behavior now, it's a breaking change, and also lead to inconsistent behaviors as some commands use Hive to do LIKE filtering.
   > 
   > cc @srielau
   
   After efforts, all commands that support syntax `Like <pattern>` have been changed. By adding a configuration `spark.sql.legacy.useVerticalBarAndStarAsWildcardsInLikePattern` (default value `false`), when it is `true`, the wildcards supported by the `pattern` of `Like` are consistent with the semantics supported before `Hive version 4` (use '*' for any character(s) and '|' for a choice as wildcards). If it is false, their behavior is consistent with the semantics of `SQL Like` (use '%' for any character(s) and '_' for a single character as wildcards), and `the document` has also been updated synchronously.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1495320778


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##########
@@ -97,6 +102,28 @@ public interface TableCatalog extends CatalogPlugin {
    */
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * <p>
+   * If the catalog supports views, this must return identifiers for only tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will

Review Comment:
   not related to this PR, but the existing doc is a bit vague. `|` is not a wildcard, right? And `|` is also a valid syntax in regex. Can we take a look at other databases and see how they document it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505550273


##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala:
##########
@@ -626,7 +626,12 @@ private[client] class Shim_v2_0 extends Shim with Logging {
 
   override def listFunctions(hive: Hive, db: String, pattern: String): Seq[String] = {
     recordHiveCall()
-    hive.getFunctions(db, pattern).asScala.toSeq
+    if (SQLConf.get.legacyUseStarAndVerticalBarAsWildcardsInLikePattern) {

Review Comment:
   This may cause `performance loss`, but there seems to be `no better `way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505582663


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowNamespacesExec.scala:
##########
@@ -53,7 +53,7 @@ case class ShowNamespacesExec(
 
     val rows = new ArrayBuffer[InternalRow]()
     namespaceNames.map { ns =>
-      if (pattern.map(StringUtils.filterPattern(Seq(ns), _).nonEmpty).getOrElse(true)) {
+      if (pattern.forall(StringUtils.filterPattern(Seq(ns), _).nonEmpty)) {

Review Comment:
   This is only a correction made based on the syntax prompted by the `IDE`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1953619718

   Just for making investigation records:
   1.snowflake: https://docs.snowflake.com/en/sql-reference/sql/show-tables
   2.clickhouse: https://clickhouse.com/docs/en/sql-reference/statements/show
   3.mysql:
   https://dev.mysql.com/doc/refman/8.0/en/show-tables.html
   https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1951500516

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1954053118

   Yea let's add a legacy config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505581906


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala:
##########
@@ -39,7 +39,7 @@ case class ShowTablesExec(
 
     val tables = catalog.listTables(namespace.toArray)
     tables.map { table =>
-      if (pattern.map(StringUtils.filterPattern(Seq(table.name()), _).nonEmpty).getOrElse(true)) {
+      if (pattern.forall(StringUtils.filterPattern(Seq(table.name()), _).nonEmpty)) {

Review Comment:
   This is only a correction made based on the syntax prompted by the `IDE`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1494414935


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##########
@@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin {
    */
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * <p>
+   * If the catalog supports views, this must return identifiers for only tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will
+   *                follow regular expression convention, case-insensitive match and white spaces
+   *                on both ends will be ignored

Review Comment:
   I searched the document and the `only possible relationship` is this one:
   https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-like.html#parameters
   <img width="910" alt="image" src="https://github.com/apache/spark/assets/15246973/357c77e8-ecdb-404e-b479-b7b7459fd06b">
   
   Perhaps we should explain it in detail here?
   (PS: The first pr that introduces `StringUtils.filterPattern` is: https://github.com/apache/spark/pull/12206)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505232316


##########
docs/sql-ref-syntax-aux-show-tables.md:
##########
@@ -40,12 +40,18 @@ SHOW TABLES [ { FROM | IN } database_name ] [ LIKE regex_pattern ]
 
 * **regex_pattern**
 
-     Specifies the regular expression pattern that is used to filter out unwanted tables. 
+     Specifies the regular expression pattern that is used to filter out unwanted tables.

Review Comment:
   Before:
   <img width="914" alt="image" src="https://github.com/apache/spark/assets/15246973/361b73f5-2d0e-4a80-a30a-7b82ef5a8577">
   
   After:
   <img width="897" alt="image" src="https://github.com/apache/spark/assets/15246973/bdb62ff0-8e2b-476a-989a-c875c79b7d77">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505263446


##########
sql/core/src/test/resources/sql-tests/analyzer-results/show-tables.sql.out:
##########
@@ -60,37 +60,37 @@ ShowTables [namespace#x, tableName#x, isTemporary#x]
 
 
 -- !query
-SHOW TABLES 'show_t*'
+SHOW TABLES 'show_t%'
 -- !query analysis
-ShowTables show_t*, [namespace#x, tableName#x, isTemporary#x]
+ShowTables show_t%, [namespace#x, tableName#x, isTemporary#x]
 +- ResolvedNamespace V2SessionCatalog(spark_catalog), [showdb]
 
 
 -- !query
-SHOW TABLES LIKE 'show_t1*|show_t2*'

Review Comment:
   The OR syntax represented by | is no longer supported by default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1495333432


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##########
@@ -97,6 +102,28 @@ public interface TableCatalog extends CatalogPlugin {
    */
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * <p>
+   * If the catalog supports views, this must return identifiers for only tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will

Review Comment:
   Okay, let me investigate it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1494067449


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##########
@@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin {
    */
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * <p>
+   * If the catalog supports views, this must return identifiers for only tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will
+   *                follow regular expression convention, case-insensitive match and white spaces
+   *                on both ends will be ignored

Review Comment:
   do we have a doc page for the pattern string semantic? If we do we should reference it here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1494597611


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##########
@@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin {
    */
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * <p>
+   * If the catalog supports views, this must return identifiers for only tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will
+   *                follow regular expression convention, case-insensitive match and white spaces
+   *                on both ends will be ignored

Review Comment:
   yea if they use the same implementation. The LIKE pattern doc does not even mention the `*`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1494598375


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##########
@@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin {
    */
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * <p>
+   * If the catalog supports views, this must return identifiers for only tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will
+   *                follow regular expression convention, case-insensitive match and white spaces
+   *                on both ends will be ignored

Review Comment:
   Another option is to document it in the SHOW TABLES doc page.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505578877


##########
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala:
##########
@@ -81,7 +87,7 @@ private[hive] class SparkGetFunctionsOperation(
 
     try {
       matchingDbs.foreach { db =>
-        catalog.listFunctions(db, functionPattern).foreach {
+        catalog.listFunctions(db, functionPattern).sortBy { item => item._1.funcName }.foreach {

Review Comment:
   In order to make the returned results more `stable`, as it contains a `HashMap` data structure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1805361784

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1953931188

   > I think it's more natural to follow the same behavior of the LIKE operator here. It seems all databases follow it (except for Hive before 4.0). Spark followed Hive at the beginning and that's probably why Spark has this special and weird behavior for the LIKE pattern in SHOW TABLES.
   > 
   > In fact, this is out of Spark's control, as it's the external catalog that applies the pattern string. We should follow the industry standard for defining the v2 catalog API. We should also update the SHOW TABLES doc page to mention the ideal behavior of the pattern string, as well as the legacy Hive behavior.
   
   Okay, let me handle it in this PR and update the document.
   Additionally, do we need to add `a legacy configuration` (default is `new` behavior) to determine whether it is using the `legacy` behavior or the `new` behavior?
   (PS: Yes, from the first PR, we can see that the original author's intention was to respect the legacy hive behavior.
   <img width="903" alt="image" src="https://github.com/apache/spark/assets/15246973/6c753a76-f134-41be-953d-ba9ade46bf6d">
   
   )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1958579779

   Just to record SQL command using `StringUtils.filterPattern` in Spark:
   |SQL Command|Example|
   |---|---|
   |||


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1953823540

   I think it's more natural to follow the same behavior of the LIKE operator here. It seems all databases follow it (except for Hive before 4.0). Spark followed Hive at the beginning and that's probably why Spark has this special and weird behavior for the LIKE pattern in SHOW TABLES.
   
   In fact, this is out of Spark's control, as it's the external catalog that applies the pattern string. We should follow the industry standard for defining the v2 catalog API. We should also update the SHOW TABLES doc page to mention the ideal behavior of the pattern string, as well as the legacy Hive behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1968294284

This is a hard decision. Technically the behavior of LIKE in many commands (`SHOW TABLES LIKE ...`) relies on the underlying catalog, which can be HMS of different versions, or a Hive-compatible metastore service. This is out of Spark's control.

From @panbingkun's [investigation](https://github.com/apache/spark/pull/43751#issuecomment-1953619718), the Hive behavior is actually very weird and different from other main-stream SQL systems (they follow the same behavior of the LIKE expression). Hive 4.0 also switches to the more common behavior.

There are some commands that we implement the LIKE filtering by our own, following the Hive behavior. Now we are in a hard position:
1. If we do nothing, then Spark's behavior of LIKE in commands is non-standard and different from other databases. We may also hit future behavior changes if we upgrade to Hive 4.0.
2. If we change the LIKE filtering behavior now, it's a breaking change, and also lead to inconsistent behaviors as some commands use Hive to do LIKE filtering.

cc @srielau

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505253654


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala:
##########
@@ -107,27 +107,82 @@ object StringUtils extends Logging {
   def isFalseString(s: UTF8String): Boolean = falseStrings.contains(s.trimAll().toLowerCase)
   // scalastyle:on caselocale
 
+  def getAllMatchWildcard: String = {
+    if (SQLConf.get.legacyUseStarAndVerticalBarAsWildcardsInLikePattern) {
+      "*"
+    } else {
+      "%"
+    }
+  }
+
+  def filterPattern(names: Seq[String], pattern: String): Seq[String] = {
+    if (SQLConf.get.legacyUseStarAndVerticalBarAsWildcardsInLikePattern) {
+      filterPatternLegacy(names, pattern)
+    } else {
+      filterBySQLLikePattern(names, pattern)
+    }
+  }
+
   /**
-   * This utility can be used for filtering pattern in the "Like" of "Show Tables / Functions" DDL
+   * This legacy utility can be used for filtering pattern in the "Like" of
+   * "Show Tables / Functions" DDL.
    * @param names the names list to be filtered
    * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will
    *                follow regular expression convention, case insensitive match and white spaces
    *                on both ends will be ignored
    * @return the filtered names list in order
    */
-  def filterPattern(names: Seq[String], pattern: String): Seq[String] = {
+  def filterPatternLegacy(names: Seq[String], pattern: String): Seq[String] = {
     val funcNames = scala.collection.mutable.SortedSet.empty[String]
     pattern.trim().split("\\|").foreach { subPattern =>
       try {
         val regex = ("(?i)" + subPattern.replaceAll("\\*", ".*")).r
-        funcNames ++= names.filter{ name => regex.pattern.matcher(name).matches() }
+        funcNames ++= names.filter { name => regex.pattern.matcher(name).matches() }
       } catch {
         case _: PatternSyntaxException =>
       }
     }
     funcNames.toSeq
   }
 
+  /**
+   * This utility can be used for filtering pattern in the "Like" of "Show Tables / Functions" DDL.
+   * @param names the names list to be filtered
+   * @param pattern the filter pattern, SQL type like expressions:
+   *                '%' for any character(s), and '_' for a single character
+   * @return the filtered names list
+   */
+  def filterBySQLLikePattern(names: Seq[String], pattern: String): Seq[String] = {
+    try {
+      val p = Pattern.compile(likePatternToRegExp(pattern), Pattern.CASE_INSENSITIVE)
+      names.filter { name => p.matcher(name).matches() }
+    } catch {
+      case _: PatternSyntaxException => Seq.empty[String]
+    }
+  }
+
+  private[util] def likePatternToRegExp(pattern: String): String = {

Review Comment:
   Follow the implementation logic of hive:
   https://github.com/apache/hive/blob/89005659d1d8e167208ba4f9f9aaa2de7703229d/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java#L64-L86



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505518259


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowFunctionsSuiteBase.scala:
##########
@@ -159,15 +160,18 @@ trait ShowFunctionsSuiteBase extends QueryTest with DDLCommandTestUtils {
 
   test("show functions matched to the '|' pattern") {
     val testFuns = Seq("crc32i", "crc16j", "date1900", "Date1")
-    withNamespaceAndFuns("ns", testFuns) { (ns, funs) =>
-      assert(sql(s"SHOW USER FUNCTIONS IN $ns").isEmpty)
-      funs.foreach(createFunction)
-      QueryTest.checkAnswer(
-        sql(s"SHOW USER FUNCTIONS IN $ns LIKE 'crc32i|date1900'"),
-        Seq("crc32i", "date1900").map(testFun => Row(qualifiedFunName("ns", testFun))))
-      QueryTest.checkAnswer(
-        sql(s"SHOW USER FUNCTIONS IN $ns LIKE 'crc32i|date*'"),
-        Seq("crc32i", "date1900", "Date1").map(testFun => Row(qualifiedFunName("ns", testFun))))
+    withSQLConf(

Review Comment:
   Considering that this UT is testing "|" and "|" is no longer supported in the new mode, we have set the configuration `spark.sql.legacy.useVerticalBarAndStarAsWildcardsInLikePattern` to true to complete this test. In the future, we can consider removing this UT



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [WIP][SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.

panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1505561012


##########
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/MetadataOperation.java:
##########
@@ -57,54 +57,6 @@ public void close() throws HiveSQLException {
     cleanupOperationLog();
   }
 
-  /**

Review Comment:
   The following method is extracted separately into class `MetadataOperationUtils` and renamed as `legacyXXX`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org