You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/09/01 02:38:04 UTC

[GitHub] [spark] itholic opened a new pull request, #42762: [SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition`

itholic opened a new pull request, #42762:
URL: https://github.com/apache/spark/pull/42762

   ### What changes were proposed in this pull request?
   
   This is followup for https://github.com/apache/spark/pull/39937 .
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   We can directly throw error when resolving output columns if there is any error, instead of calling `resolveColumnsByPosition` again.
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   The existing CI should pass.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45476][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #42762: [SPARK-45476][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition`
URL: https://github.com/apache/spark/pull/42762


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #42762: [SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition`

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1312800194


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -111,7 +111,9 @@ object TableOutputResolver {
     }
 
     if (errors.nonEmpty) {
-      resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
+      throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(

Review Comment:
   Thanks for testing! Let me add some test for this then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #42762: [SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition`

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1312746260


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -111,7 +111,9 @@ object TableOutputResolver {
     }
 
     if (errors.nonEmpty) {
-      resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
+      throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(

Review Comment:
   Seems like the test already in `sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala`, but the existing code also end up with `incompatibleDataToTableCannotFindDataError` eventually, so the test was not failed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Hisoka-X commented on a diff in pull request #42762: [SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition`

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1312770758


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -111,7 +111,9 @@ object TableOutputResolver {
     }
 
     if (errors.nonEmpty) {
-      resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
+      throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(

Review Comment:
   I don't think so, I add breakpoint for line 114, and debug DataFrameWriterV2Suite but never blocked, in fact, the test case never reached line 113 and line 114. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Hisoka-X commented on a diff in pull request #42762: [SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition`

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1312475968


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -111,7 +111,9 @@ object TableOutputResolver {
     }
 
     if (errors.nonEmpty) {
-      resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
+      throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(

Review Comment:
   I'm a liitle confused, why currently test case can passed normally before this PR, is that we never can reach here by all test case? If so, should we add some test case to cover it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1349427325


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -110,10 +110,6 @@ object TableOutputResolver {
       resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
     }
 
-    if (errors.nonEmpty) {

Review Comment:
   So, seem like we can delete `errors` parameter? (If can not find any test case to prove it worked)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1351289594


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -110,10 +110,6 @@ object TableOutputResolver {
       resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
     }
 
-    if (errors.nonEmpty) {

Review Comment:
   I would jus tthrow an exception here since this is a followup of SPARK-42309. If we need to fix a different issue, we should have another jira.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #42762: [SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition`

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1312534942


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -111,7 +111,9 @@ object TableOutputResolver {
     }
 
     if (errors.nonEmpty) {
-      resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
+      throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(

Review Comment:
   I'm not 100% sure, but I think strongly suspect that there are two possible reasons:
   - No test to cover this case.
   - It is being tested in an incorrect way somewhere.
   
   Let's waiting for the CI for now. I believe we could find some reason after then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Hisoka-X commented on a diff in pull request #42762: [SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition`

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1312772412


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -111,7 +111,9 @@ object TableOutputResolver {
     }
 
     if (errors.nonEmpty) {
-      resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
+      throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(

Review Comment:
   Maybe you should addd some test case? Or just try remove line 113 and line 114 to find out which test case failed. Then you can get result.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

Posted by "Hisoka-X (via GitHub)" <gi...@apache.org>.
Hisoka-X commented on PR #42762:
URL: https://github.com/apache/spark/pull/42762#issuecomment-1754157169

   cc @HyukjinKwon @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45476][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42762:
URL: https://github.com/apache/spark/pull/42762#issuecomment-1763558493

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45476][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #42762:
URL: https://github.com/apache/spark/pull/42762#issuecomment-1761174980

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #42762: [SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition`

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1312534942


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -111,7 +111,9 @@ object TableOutputResolver {
     }
 
     if (errors.nonEmpty) {
-      resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
+      throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(

Review Comment:
   I'm not 100% sure, but I think strongly suspect that there are possible reason:
   - No test to cover this case.
   - It is being tested in an incorrect way somewhere.
   
   Let's waiting for the CI for now. I believe we could find some reason after then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #42762: [SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition`

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1312534942


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -111,7 +111,9 @@ object TableOutputResolver {
     }
 
     if (errors.nonEmpty) {
-      resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
+      throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(

Review Comment:
   I'm not 100% sure, but I possibly there are one of two reasons below:
   - No test to cover this case.
   - It is being tested in an incorrect way somewhere.
   
   Let's waiting for the CI for now. I believe we could find some reason after then.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -111,7 +111,9 @@ object TableOutputResolver {
     }
 
     if (errors.nonEmpty) {
-      resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
+      throw QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(

Review Comment:
   I'm not 100% sure, but I think possibly there are one of two reasons below:
   - No test to cover this case.
   - It is being tested in an incorrect way somewhere.
   
   Let's waiting for the CI for now. I believe we could find some reason after then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1351085843


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -110,10 +110,6 @@ object TableOutputResolver {
       resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
     }
 
-    if (errors.nonEmpty) {

Review Comment:
   Yeah, I think we can just remove it. Sorry for delaying, I've been stuck on Pandas 2.x support on PySpark side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45476][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1351516059


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -110,10 +110,6 @@ object TableOutputResolver {
       resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _)
     }
 
-    if (errors.nonEmpty) {

Review Comment:
   Make sense to me. Created new JIRA SPARK-45476 and update the PR title & description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org