You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Max Gekk (Jira)" <ji...@apache.org> on 2023/11/01 07:40:00 UTC
[jira] [Assigned] (SPARK-45022) Provide context for dataset API errors

     [ https://issues.apache.org/jira/browse/SPARK-45022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Max Gekk reassigned SPARK-45022:
--------------------------------

    Assignee: Max Gekk

> Provide context for dataset API errors
> --------------------------------------
>
>                 Key: SPARK-45022
>                 URL: https://issues.apache.org/jira/browse/SPARK-45022
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Peter Toth
>            Assignee: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>
> SQL failures already provide nice error context when there is a failure:
> {noformat}
> org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
> == SQL(line 1, position 1) ==
> a / b
> ^^^^^
> 	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
> 	at org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
> ...
> {noformat}
> We could add a similar user friendly error context to Dataset APIs.
> E.g. consider the following Spark app SimpleApp.scala:
> {noformat}
>    1  import org.apache.spark.sql.SparkSession
>    2  import org.apache.spark.sql.functions._
>    3
>    4  object SimpleApp {
>    5    def main(args: Array[String]) {
>    6      val spark = SparkSession.builder.appName("Simple Application").config("spark.sql.ansi.enabled", true).getOrCreate()
>    7      import spark.implicits._
>    8
>    9      val c = col("a") / col("b")
>   10
>   11      Seq((1, 0)).toDF("a", "b").select(c).show()
>   12
>   13      spark.stop()
>   14    }
>   15  }
> {noformat}
> then the error context could be:
> {noformat}
> Exception in thread "main" org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
> == Dataset ==
> "div" was called from SimpleApp$.main(SimpleApp.scala:9)
> 	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
> 	at org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:672
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org