You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/21 16:13:39 UTC

[GitHub] [spark] gengliangwang opened a new pull request #35926: [SPARK-38616][SQL] Keep track of SQL query text in Catalyst TreeNode

gengliangwang opened a new pull request #35926:
URL: https://github.com/apache/spark/pull/35926

### What changes were proposed in this pull request?

Spark SQL uses the class Origin for tracking the position of each TreeNode in the SQL query text. When there is a parser error, we can show the position info in the error message:
```
> sql("create tabe foo(i int)")
org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'create tabe'(line 1, pos 7)

== SQL ==
create tabe foo(i int)
-------^^^
```
It contains two fields: line and startPosition. This is enough for the parser since the SQL query text is known.

However, the SQL query text is unknown in the execution phase. Spark SQL can't show the problematic SQL clause on ANSI runtime failures.
This PR is to include the query text in Origin. After this, we can provide details in the error messages of Expressions which can throw runtime exceptions when ANSI mode is on.

### Why are the changes needed?

Currently, there is not enough error context for runtime ANSI failures.

In the following example, the error message only tells that there is a "divide by zero" error, without pointing out where the exact SQL statement is.
```
> SELECT
ss1.ca_county,
ss1.d_year,
ws2.web_sales / ws1.web_sales web_q1_q2_increase,
ss2.store_sales / ss1.store_sales store_q1_q2_increase,
ws3.web_sales / ws2.web_sales web_q2_q3_increase,
ss3.store_sales / ss2.store_sales store_q2_q3_increase
FROM
ss ss1, ss ss2, ss ss3, ws ws1, ws ws2, ws ws3
WHERE
ss1.d_qoy = 1
AND ss1.d_year = 2000
AND ss1.ca_county = ss2.ca_county
AND ss2.d_qoy = 2
AND ss2.d_year = 2000
AND ss2.ca_county = ss3.ca_county
AND ss3.d_qoy = 3
AND ss3.d_year = 2000
```
```
org.apache.spark.SparkArithmeticException: divide by zero at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:140) at org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:437) at org.apache.spark.sql.catalyst.expressions.DivModLike.eval$(arithmetic.scala:425) at org.apache.spark.sql.catalyst.expressions.Divide.eval(arithmetic.scala:534)
```
This PR is the initial PR for the project https://issues.apache.org/jira/browse/SPARK-38615
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #35926: [WIP][SPARK-38616][SQL] Keep track of SQL query text in Catalyst TreeNode

Posted by GitBox <gi...@apache.org>.

gengliangwang commented on pull request #35926:
URL: https://github.com/apache/spark/pull/35926#issuecomment-1075137032


   This approach can cost memory if the query text is long. I created a new PR https://github.com/apache/spark/pull/35926 which is more friendly in memory usage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] gengliangwang closed pull request #35926: [WIP][SPARK-38616][SQL] Keep track of SQL query text in Catalyst TreeNode

Posted by GitBox <gi...@apache.org>.

gengliangwang closed pull request #35926:
URL: https://github.com/apache/spark/pull/35926


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org