You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "Hisoka-X (via GitHub)" <gi...@apache.org> on 2023/07/29 01:51:45 UTC

[GitHub] [spark] Hisoka-X opened a new pull request, #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Hisoka-X opened a new pull request, #42220:
URL: https://github.com/apache/spark/pull/42220

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   Fix INSERT BY NAME returns nonsensical error message on v1 datasource.
   eg:
   ```scala
   CREATE TABLE bug(c1 INT);
   
   INSERT INTO bug BY NAME SELECT 1 AS c2;
   
   ==> Multi-part identifier cannot be empty.
   ```
   After PR:
   ```scala
   [INCOMPATIBLE_DATA_FOR_TABLE.EXTRA_STRUCT_FIELDS] Cannot write incompatible data for the table `spark_catalog`.`default`.`bug`: Cannot write extra fields `c2` to the struct `table`.
   ```
   
   
   Also fixed the same issue when throwing other INCOMPATIBLE_DATA_FOR_TABLE type errors
   
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   Fix the error msg nonsensical.
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, the error msg in v1 insert by name will be changed.
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   add new test.
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297336418


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   My question is simple: this is the branch of `if (matchedCols.size < inputCols.length) {` which means extra columns, why do we throw `incompatibleDataToTableCannotFindDataError` for top-level columns which seems to be missing columns?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1296774956


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   extra field? I think it's for missing column/field here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #42220:
URL: https://github.com/apache/spark/pull/42220#issuecomment-1669289965

   @MaxGekk does this PR align with your plan of unifying the assignment errors?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan closed pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message
URL: https://github.com/apache/spark/pull/42220


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on PR #42220:
URL: https://github.com/apache/spark/pull/42220#issuecomment-1669542270

   @cloud-fan In my PR https://github.com/apache/spark/pull/42393, I am going to touch non-`BY NAME` functionality so far.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297955542


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   The code here is so convoluted...
   
   `if (reordered.length == expectedCols.length)` I think this means there is no missing col? For every table column, we can find a match in the input columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #42220:
URL: https://github.com/apache/spark/pull/42220#issuecomment-1657951810

   can we update the PR description for the new changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1278987087


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -263,16 +268,26 @@ object TableOutputResolver {
       val extraColsStr = inputCols.takeRight(inputCols.size - expectedCols.size)
         .map(col => toSQLId(col.name))
         .mkString(", ")
-      throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-        tableName, colPath.quoted, extraColsStr
-      )
+      if (colPath.isEmpty) {
+        throw QueryCompilationErrors.cannotWriteTooManyColumnsToTableError(tableName,
+          expectedCols.map(_.name), inputCols.map(_.toAttribute))
+      } else {
+        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
+          tableName, colPath.quoted, extraColsStr
+        )
+      }
     } else if (inputCols.size < expectedCols.size) {
       val missingColsStr = expectedCols.takeRight(expectedCols.size - inputCols.size)
         .map(col => toSQLId(col.name))
         .mkString(", ")
-      throw QueryCompilationErrors.incompatibleDataToTableStructMissingFieldsError(
-        tableName, colPath.quoted, missingColsStr
-      )
+      if (colPath.isEmpty) {

Review Comment:
   this appears three times, we can add a method for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1285534129


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name

Review Comment:
   In fact it could not be empty. When the code executes here, that's meaning  `expectedCols` must had col not exsit in  `matchedCols`. The test case can prove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #42220:
URL: https://github.com/apache/spark/pull/42220#issuecomment-1659420608

   ```
   [INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS] Cannot write to `spark_catalog`.`default`.`bug`, the reason is not enough data columns:
   Table columns: `c1`.
   Data columns: `c2`.
   ```
   
   For by-name insert, shall we just say "incompatible data columns"? The number of columns does match here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297898934


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   `table: [a, b, c], input [a, c, d]`
   Extra column and missing column are orthogonal, they can both occur, or only one of them. Can you elaborate on the error checking logic here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1278857830


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -241,7 +241,7 @@ object TableOutputResolver {
         val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
           .map(col => s"${toSQLId(col.name)}").mkString(", ")
         throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   Changed. Please check again, thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on PR #42220:
URL: https://github.com/apache/spark/pull/42220#issuecomment-1657961173

   > can we update the PR description for the new changes?
   
   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1296767839


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   I think so, the `QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError` just suite for nested field but not for top-level column.
   
   Here are msg template:
   ```
   Cannot write extra fields `<extraFields>` to the struct `<colName>`.
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297314416


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   These are DS version and error info mapping.
   | API | Type | Error | Info | Where throw |
   | --- | --- | --- | --- | --- |
   | V1  | Top Column | miss field | CANNOT_FIND_DATA | TableOutputResolver:243 |
   | V1  | Top Column | extra field | TOO_MANY_DATA_COLUMNS | TableOutputResolver:51 |
   | V2  | Top Column | extra field | TOO_MANY_DATA_COLUMNS | TableOutputResolver:51 |
   | V2  | Top Column | miss field | CANNOT_FIND_DATA | TableOutputResolver:201 |
   | V2  | Struct | miss field | CANNOT_FIND_DATA | TableOutputResolver:201 |
   | V2  | Struct | extra field | EXTRA_STRUCT_FIELDS | TableOutputResolver:248 |
   
   There are two type error info: `CANNOT_FIND_DATA` and `EXTRA_STRUCT_FIELDS/TOO_MANY_DATA_COLUMNS`.
   
   The key difference between `TableOutputResolver:248` and `TableOutputResolver:243` is `Top Column` in 243 only report `CANNOT_FIND_DATA` but not `EXTRA_STRUCT_FIELDS`. Because `EXTRA_STRUCT_FIELDS` already be check in `TableOutputResolver:51` when field is top level column (`TOO_MANY_DATA_COLUMNS` are same like `EXTRA_STRUCT_FIELDS`). But this check only for top level column. So we can make sure that when we invoke `TableOutputResolver:243`, the error only could be `CANNOT_FIND_DATA`. But struct doesn't had check like `TOO_MANY_DATA_COLUMNS`. So we should invoke `EXTRA_STRUCT_FIELDS`. And struct's `CANNOT_FIND_DATA` already be checked in `TableOutputResolver:201`. So when invoke `TableOutputResolver:248`, only can be `EXTRA_STRUCT_FIELDS`. But check `CANNOT_FIND_DATA` in `TableOutputResolver:201` will miss on V1 top column because it had default value (as `null`).
   
   If we want improve this part logic, I think we should add `TOO_MANY_DATA_COLUMNS(EXTRA_STRUCT_FIELDS)` check for struct before `reorderColumnsByName`, then we can replace `TableOutputResolver:248` to `TableOutputResolver:243`.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297914592


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   Only extra column => TOO_MANY_DATA_COLUMNS
   Only missing column => CANNOT_FIND_DATA
   Both => CANNOT_FIND_DATA



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297935032


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   Same as top-level column, both https://github.com/apache/spark/pull/42220/files/4936877b40bbd46a53450c0199ebdad38cd121c2#diff-6e331e8f1c67b5920fb46263b6e582ec6e6a253ee45543559c9692a72a1a40ecR201-R203



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1296857130


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   It's old behavior, if these are `x int, y int` in struct, and we want add `x int, y int, z int` into struct. It will throw `extra fields z` in here. But it will not happend on top-level column (It will throw `INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on PR #42220:
URL: https://github.com/apache/spark/pull/42220#issuecomment-1659449519

   > ```
   > [INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS] Cannot write to `spark_catalog`.`default`.`bug`, the reason is not enough data columns:
   > Table columns: `c1`.
   > Data columns: `c2`.
   > ```
   > 
   > For by-name insert, shall we just say "incompatible data columns"? The number of columns does match here.
   
   Oh, let me align v1 and v2 errors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on PR #42220:
URL: https://github.com/apache/spark/pull/42220#issuecomment-1702668615

   Thanks @cloud-fan and @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297961931


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   > if (reordered.length == expectedCols.length) I think this means there is no missing col
   
   Yep, but there are some special for V1 case, eg: table [x, y, z], inputcol: [x, y, k], when we set `fillDefaultValue` as true, the  `reordered` will be [x, y, null as z]. Then `reordered.length == expectedCols.length` will be true.
   
   This PR only change the error to `CANNOT_FIND_DATA`, but I'm not sure we should support case like `table [x, y, z], inputcol: [x, y]`? Then just put `z` as `null`.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1298036900


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   make sense, I had changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297174257


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   I still don't quite understand the code here. If it's for extra columns/fields, why do we throw an error for top-level columns?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297920816


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   https://github.com/apache/spark/pull/42220#discussion_r1297314416
   
   The main reason are when reach here, the top-level columns can not be `TOO_MANY_DATA_COLUMNS`, because it already be checked in `TableOutputResolver:51`. Only will be both have extra column and missing column, so we throw `CANNOT_FIND_DATA` to make sure v1 and v2 align.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1278822211


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -241,7 +241,7 @@ object TableOutputResolver {
         val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
           .map(col => s"${toSQLId(col.name)}").mkString(", ")
         throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   I think we should throw `QueryCompilationErrors.cannotWriteTooManyColumnsToTableError` or `cannotWriteNotEnoughColumnsToTableError` here if `colPath` is empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1285461310


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+                  .map(col => s"${toSQLId(col.name)}").mkString(", ")

Review Comment:
   Fix indentation



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name

Review Comment:
   Can this be empty `expectedCols.filter(col => !matchedCols.contains(col.name))`? If so, `.head` can fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1285534534


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+                  .map(col => s"${toSQLId(col.name)}").mkString(", ")

Review Comment:
   Thanks! I missed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1311229745


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1025,6 +1025,11 @@
           "Cannot safely cast <colName> <srcType> to <targetType>."
         ]
       },
+      "EXTRA_FIELDS" : {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297961931


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   > if (reordered.length == expectedCols.length) I think this means there is no missing col
   
   Yep, but there are some special for V1 case, eg: table [x, y, z], inputcol: [x, y, k], when we set `fillDefaultValue` as true, the  `reordered` will be [x, y, null as z]. Then `reordered.length == expectedCols.length` will be true.
   
   This PR only change the error to `CANNOT_FIND_DATA`, but I'm not sure we should support case like `table [x, y, z], inputcol: [x, y, k]`? Then just put `z` as `null`.
   
   In the future, when we support default value on DS V2, we will face this problem again.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1296759688


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   so top-level column and nested field have different error classes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297920816


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   https://github.com/apache/spark/pull/42220#discussion_r1297314416
   
   The main reason are when reach here, the top-level columns can not be `TOO_MANY_DATA_COLUMNS`, because it already be checked in `TableOutputResolver:51`. Only will be had extra column and missing column, so we throw `CANNOT_FIND_DATA` to make sure v1 and v2 align.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297376865


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   Just want align v1 and v2 error. Have extra columns also meaning missing column in here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1298007906


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   missing column error should only be thrown if that missing column does not have a default value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on PR #42220:
URL: https://github.com/apache/spark/pull/42220#issuecomment-1681529759

   Hi @cloud-fan @MaxGekk Shall we continue push on this PR? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297173214


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   so the behavior is inconsistent between top-level columns and struct fields?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297917168


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   when we reach here, why it means missing col for top-level columns, but extra column for nested fields?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297961931


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   > if (reordered.length == expectedCols.length) I think this means there is no missing col
   
   Yep, but there are some special for V1 case, eg: table [x, y, z], inputcol: [x, y, k], when we set `fillDefaultValue` as true, the  `reordered` will be [x, y, null as z]. Then `reordered.length == expectedCols.length` will be true.
   
   This PR only change the error to `CANNOT_FIND_DATA`, but I'm not sure we should support case like `table [x, y, z], inputcol: [x, y, k]`? Then just put `z` as `null`.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1279004638


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -263,16 +268,26 @@ object TableOutputResolver {
       val extraColsStr = inputCols.takeRight(inputCols.size - expectedCols.size)
         .map(col => toSQLId(col.name))
         .mkString(", ")
-      throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-        tableName, colPath.quoted, extraColsStr
-      )
+      if (colPath.isEmpty) {
+        throw QueryCompilationErrors.cannotWriteTooManyColumnsToTableError(tableName,
+          expectedCols.map(_.name), inputCols.map(_.toAttribute))
+      } else {
+        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
+          tableName, colPath.quoted, extraColsStr
+        )
+      }
     } else if (inputCols.size < expectedCols.size) {
       val missingColsStr = expectedCols.takeRight(expectedCols.size - inputCols.size)
         .map(col => toSQLId(col.name))
         .mkString(", ")
-      throw QueryCompilationErrors.incompatibleDataToTableStructMissingFieldsError(
-        tableName, colPath.quoted, missingColsStr
-      )
+      if (colPath.isEmpty) {

Review Comment:
   Yep, but the error be throwed not same on each case. Add a method only can extract `if` statement, are we still should to do this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on PR #42220:
URL: https://github.com/apache/spark/pull/42220#issuecomment-1656514557

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1296767839


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   I think so, the `QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError` just suite for nested field but not for top-level column.
   
   There are msg template:
   ```
   Cannot write extra fields `<extraFields>` to the struct `<colName>`.
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1298007581


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   >  eg: table [x, y, z], inputcol: [x, y, k],
   
   If `z` has default value, I think it means extra column error (k is extra)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1298007581


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   >  eg: table [x, y, z], inputcol: [x, y, k],
   
   If `z` has default value, I think it means extra column error?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297931858


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => !matchedCols.contains(col.name)).head.name
+          throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,

Review Comment:
   where do we check missing cols error for nested fields?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1311212475


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1025,6 +1025,11 @@
           "Cannot safely cast <colName> <srcType> to <targetType>."
         ]
       },
+      "EXTRA_FIELDS" : {
+        "message" : [
+          "Cannot write extra fields <colName>."

Review Comment:
   it's one column or multiple columns?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1311212214


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1025,6 +1025,11 @@
           "Cannot safely cast <colName> <srcType> to <targetType>."
         ]
       },
+      "EXTRA_FIELDS" : {

Review Comment:
   `EXTRAC_COLUMNS`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #42220:
URL: https://github.com/apache/spark/pull/42220#issuecomment-1702666840

   thanks, merging to master/3.5!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.

Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1311229461


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -1025,6 +1025,11 @@
           "Cannot safely cast <colName> <srcType> to <targetType>."
         ]
       },
+      "EXTRA_FIELDS" : {
+        "message" : [
+          "Cannot write extra fields <colName>."

Review Comment:
   multiple columns, so I do some change, please check it again. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org