You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "Hisoka-X (via GitHub)" <gi...@apache.org> on 2023/05/10 02:14:27 UTC

[GitHub] [spark] Hisoka-X opened a new pull request, #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables

Hisoka-X opened a new pull request, #41111:
URL: https://github.com/apache/spark/pull/41111

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   
   Support ANALYZE TABLE on v2 tables.
   
   Through this PR, users can use the `ANALYZE TABLE` statement on Datasourcev2 to analyze the table. Since the API of Datasourcev2 does not support the `NOSCAN` and `PARTITION` features, currently using the Analyze table with partition and with scan statements will report an error.
   
   The statistics obtained through the analysis will be stored in the `SessionState` to be used by statements of the `DESC EXTENDED`. In the future, the data in the `SessionState` can be provided to `DataSourceV2Relation` to reduce repeated statistics.
   
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   `ANALYZE TABLE` syntax for aligning Datasourcev1 and v2
   
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   Add new test
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Hisoka-X commented on a diff in pull request #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on code in PR #41111:
URL: https://github.com/apache/spark/pull/41111#discussion_r1249070329


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1973,6 +1973,12 @@
       "The sum of the LIMIT clause and the OFFSET clause must not be greater than the maximum 32-bit integer value (2,147,483,647) but found limit = <limit>, offset = <offset>."
     ]
   },
+  "TABLE_NOT_SUPPORTED_STATISTIC" : {

Review Comment:
   Thanks for remind, done.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##########
@@ -1254,6 +1254,12 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase {
       messageParameters = Map("o" -> o.toString()))
   }
 
+  def tableNotSupportedStatisticError(tableName: String): SparkIllegalArgumentException = {
+    new SparkIllegalArgumentException(
+      errorClass = "TABLE_NOT_SUPPORTED_STATISTIC",
+      messageParameters = Map("tableName" -> tableName))

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables [spark]

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on code in PR #41111:
URL: https://github.com/apache/spark/pull/41111#discussion_r1365059135


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -76,6 +76,7 @@ private[sql] class SessionState(
     val sqlParser: ParserInterface,
     analyzerBuilder: () => Analyzer,
     optimizerBuilder: () => Optimizer,
+    statisticsCacheBuilder: () => StatisticsCache,

Review Comment:
   I am also a little unsure about this. Since V2Catalog does not have an interface for storing stats (it seems not suitable to add such an interface), stats can only be stored in SessionCatalog or Session. Is it appropriate to put the V2Table information in SessionCatalog? Or should put it into V2SessionCatalog (But I'm worried about breaking his role, which currently seems to be just a translation of SessionCatalog to V2Catalog)? Please give me some good suggestion. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables [spark]

Posted by "felipepessoto (via GitHub)" <gi...@apache.org>.
felipepessoto commented on PR #41111:
URL: https://github.com/apache/spark/pull/41111#issuecomment-2037801469

   @MaxGekk, @Hisoka-X, do you have any plans to reconsider this?
   
   I think the ANALYZE command is still important, even for Delta. By default it only collects stats for 32 columns, and it doesn't has all the stats, for example distinct count (which is important for query planning), or histogram.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jieyu-lin commented on pull request #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables

Posted by "jieyu-lin (via GitHub)" <gi...@apache.org>.
jieyu-lin commented on PR #41111:
URL: https://github.com/apache/spark/pull/41111#issuecomment-1558435904

   Hi @Hisoka-X 
   
   Good day. I would like to know whether this pull request to for spark sql query with delta lake delta table v2.
   I simply encounter this issue with spark 3.2 (delta lake 1.2) and spark 3.3 (delta lake 2.2), which I think this pull request could help resolve this issue.
   
   ![image](https://github.com/apache/spark/assets/51392682/591ada27-5525-402e-995d-14f5c3c3f11f)
   
   ![image](https://github.com/apache/spark/assets/51392682/0908e173-325e-4057-be6b-c9ae5e990f8f)
   
   I saw this ANALYZE TABLE command is originated from spark : https://spark.apache.org/docs/3.3.1/sql-ref-syntax-aux-analyze-table.html#analyze-table. But delta lake documentation do not mention ANALYZE TABLE usage https://docs.delta.io/2.2.0/delta-intro.html.
   
   Since this pull request will merge to master branch and want to know **whether it will backfill to spark 3.2, spark 3.3 also**. 
   
   Thank you
   
   Best regards,
   Jerry


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Hisoka-X commented on pull request #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on PR #41111:
URL: https://github.com/apache/spark/pull/41111#issuecomment-1547474099

   cc @MaxGekk @cloud-fan @hvanhovell 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables [spark]

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on code in PR #41111:
URL: https://github.com/apache/spark/pull/41111#discussion_r1366309059


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -76,6 +76,7 @@ private[sql] class SessionState(
     val sqlParser: ParserInterface,
     analyzerBuilder: () => Analyzer,
     optimizerBuilder: () => Optimizer,
+    statisticsCacheBuilder: () => StatisticsCache,

Review Comment:
   Got it. Thanks for explain!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Hisoka-X commented on pull request #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on PR #41111:
URL: https://github.com/apache/spark/pull/41111#issuecomment-1657416580

   @viirya @wangyum Would you mind taking a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Hisoka-X commented on pull request #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on PR #41111:
URL: https://github.com/apache/spark/pull/41111#issuecomment-1632091918

   kindly ping @MaxGekk @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #41111:
URL: https://github.com/apache/spark/pull/41111#discussion_r1363642867


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -76,6 +76,7 @@ private[sql] class SessionState(
     val sqlParser: ParserInterface,
     analyzerBuilder: () => Analyzer,
     optimizerBuilder: () => Optimizer,
+    statisticsCacheBuilder: () => StatisticsCache,

Review Comment:
   This confuses me. Why do you maintain statistics here in the session state but not in a catalog, for instance.
   Should be similar to the V1 command, I think:
   https://github.com/apache/spark/blob/4a35a31c038f726f9329b4f28f3dde87286fb8d2/sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala#L242



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #41111:
URL: https://github.com/apache/spark/pull/41111#discussion_r1365078762


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -76,6 +76,7 @@ private[sql] class SessionState(
     val sqlParser: ParserInterface,
     analyzerBuilder: () => Analyzer,
     optimizerBuilder: () => Optimizer,
+    statisticsCacheBuilder: () => StatisticsCache,

Review Comment:
   сс @cloud-fan



##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -76,6 +76,7 @@ private[sql] class SessionState(
     val sqlParser: ParserInterface,
     analyzerBuilder: () => Analyzer,
     optimizerBuilder: () => Optimizer,
+    statisticsCacheBuilder: () => StatisticsCache,

Review Comment:
   сс @cloud-fan



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #41111:
URL: https://github.com/apache/spark/pull/41111#discussion_r1360633362


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -3177,6 +3177,11 @@
           "<variableName> is a VARIABLE and cannot be updated using the SET statement. Use SET VARIABLE <variableName> = ... instead."
         ]
       },
+      "TABLE_NOT_SUPPORTED_STATISTIC" : {
+        "message" : [
+          "Cannot gather statistics for the table <tableName> because it does not the feature."

Review Comment:
   The error class already has the words about unsupported feature, see:
   ```json
     "UNSUPPORTED_FEATURE" : {
       "message" : [
         "The feature is not supported:"
       ],
   ```
   Please, remove the tail:
   ```suggestion
             "Gather statistics for the table <tableName>."
   ```



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -3177,6 +3177,11 @@
           "<variableName> is a VARIABLE and cannot be updated using the SET statement. Use SET VARIABLE <variableName> = ... instead."
         ]
       },
+      "TABLE_NOT_SUPPORTED_STATISTIC" : {

Review Comment:
   ```suggestion
         "TABLE_STATISTICS" : {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables [spark]

Posted by "jalberti (via GitHub)" <gi...@apache.org>.
jalberti commented on PR #41111:
URL: https://github.com/apache/spark/pull/41111#issuecomment-1773816619

   Why do we not add a new SupportsGatherStatistics, to allow V2 to implement necessities for ANALYZE TABLE support? Or do we expect V2 to provide sql extension with custom functions? When using different V2 provider together, each will have to implement different functions? User would be required to understand table kind? ANALYZE TABLE seems like a good abstraction? Just like CREATE is only specific in options?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #41111:
URL: https://github.com/apache/spark/pull/41111#discussion_r1252245503


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2635,6 +2635,11 @@
           "<property> is a reserved table property, <msg>."
         ]
       },
+      "TABLE_NOT_SUPPORTED_STATISTIC" : {
+        "message" : [
+          "Cannot statistic table <tableName> because it unsupported."

Review Comment:
   ```suggestion
             "Cannot gather statistics for the table <tableName> because it does not the feature."
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jieyu-lin commented on pull request #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables

Posted by "jieyu-lin (via GitHub)" <gi...@apache.org>.
jieyu-lin commented on PR #41111:
URL: https://github.com/apache/spark/pull/41111#issuecomment-1566359887

   Hi @HyukjinKwon ,
   
   I would like to know this feature will be backfilled to spark 3.2, spark 3.3 also? It is important for us to review our OSS spark version.
   
   Best regards,
   Jerry Lin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables [spark]

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on code in PR #41111:
URL: https://github.com/apache/spark/pull/41111#discussion_r1363025013


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -3177,6 +3177,11 @@
           "<variableName> is a VARIABLE and cannot be updated using the SET statement. Use SET VARIABLE <variableName> = ... instead."
         ]
       },
+      "TABLE_NOT_SUPPORTED_STATISTIC" : {
+        "message" : [
+          "Cannot gather statistics for the table <tableName> because it does not the feature."

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Hisoka-X commented on pull request #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on PR #41111:
URL: https://github.com/apache/spark/pull/41111#issuecomment-1558455448

   > I would like to know whether this pull request to for spark sql query with delta lake delta table v2.
   
   This PR suite for all DataSource V2 which support `SupportsReportStatistics` interface.
   
   > Since this pull request will merge to master branch and want to know whether it will backfill to spark 3.2, spark 3.3 also.
   
   I'm not sure for this, seem like only merge into master only.  @HyukjinKwon Can you help to answer this question? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #41111:
URL: https://github.com/apache/spark/pull/41111#discussion_r1248727994


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1973,6 +1973,12 @@
       "The sum of the LIMIT clause and the OFFSET clause must not be greater than the maximum 32-bit integer value (2,147,483,647) but found limit = <limit>, offset = <offset>."
     ]
   },
+  "TABLE_NOT_SUPPORTED_STATISTIC" : {

Review Comment:
   Could you add a sub-class of `UNSUPPORTED_FEATURE`, please.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##########
@@ -1254,6 +1254,12 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase {
       messageParameters = Map("o" -> o.toString()))
   }
 
+  def tableNotSupportedStatisticError(tableName: String): SparkIllegalArgumentException = {
+    new SparkIllegalArgumentException(
+      errorClass = "TABLE_NOT_SUPPORTED_STATISTIC",
+      messageParameters = Map("tableName" -> tableName))

Review Comment:
   Wrap `tableName` by `toSQLId`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #41111:
URL: https://github.com/apache/spark/pull/41111#discussion_r1365665340


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -76,6 +76,7 @@ private[sql] class SessionState(
     val sqlParser: ParserInterface,
     analyzerBuilder: () => Analyzer,
     optimizerBuilder: () => Optimizer,
+    statisticsCacheBuilder: () => StatisticsCache,

Review Comment:
   > Please give me some good suggestion. Thanks.
   
   Our long term goal is to avoid special command for updating statistics. Like in Delta (https://github.com/delta-io/delta), gathering needed states is hidden from users. I would propose to output proper error message that V2 `ANALYZE TABLE` is not supported.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables [spark]

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X closed pull request #41111: [SPARK-39420][SQL] Support `ANALYZE TABLE` on Datasource V2 tables
URL: https://github.com/apache/spark/pull/41111


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org