You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yeachan Park <ye...@gmail.com> on 2023/04/05 16:32:27 UTC

Raise exception whilst casting instead of defaulting to null

Hi all,

The default behaviour of Spark is to add a null value for casts that fail,
unless ANSI SQL is enabled, SPARK-30292
<https://issues.apache.org/jira/browse/SPARK-30292>.

Whilst I understand that this is a subset of ANSI compliant behaviour, I
don't understand why this feature is so coupled. Enabling ANSI also comes
with other consequences that fall outside casting behaviour, and not all
Spark operations are done via the SQL interface (i.e. spark.sql("") ).

I can imagine it's a pretty useful feature to have something like an extra
arg that would raise an exception if casting fails (e.g. *df.age.cast("int",
raise=True)* ) without enabling ANSI as an option.

Does anyone know why this approach was chosen/have I missed something?
Would others find something like this useful?

Thanks,
Yeachan