You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Mike Seddon (Jira)" <ji...@apache.org> on 2020/12/02 22:45:00 UTC

[jira] [Commented] (ARROW-10793) [Rust] [DataFusion] Decide on CAST behaviour for invalid inputs

    [ https://issues.apache.org/jira/browse/ARROW-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242793#comment-17242793 ] 

Mike Seddon commented on ARROW-10793:
-------------------------------------

I strongly feel that DataFusion should adopt the ANSI-style strict typing rather than silent error suppression and return of NULL values as intuitively (due to years of using DBMS) users expect that if an error was thrown then all operations were completed successfully.

The default Spark behavior was inherited from Hive SQL which I assume was originally built at Facebook where absolute precision was not necessarily important. As part of the Spark 3.0 release a huge amount of effort was put in to comply with ANSI standard SQL ([https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html|https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html)]) which is obviously a lot harder to retrofit than start with.

This also goes wider than type conversions as adopting ANSI SQL standards (including functionality like [https://github.com/apache/arrow/pull/8688] which I think requires a CASE statement in ANSI SQL) should maybe be agreed by the PMC to give a framework for assessing PRs against. Perhaps this ticket should be changed to a discussion of which dialect of SQL DataFusion aims to support.

> [Rust] [DataFusion] Decide on CAST behaviour for invalid inputs
> ---------------------------------------------------------------
>
>                 Key: ARROW-10793
>                 URL: https://issues.apache.org/jira/browse/ARROW-10793
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust - DataFusion
>            Reporter: Andy Grove
>            Priority: Major
>
> This is a placeholder for now. See discussion on [https://github.com/apache/arrow/pull/8794]
> Briefly, the issue is do we want CAST to return null for invalid inputs or throw an error. Spark has different behavior depending on whether ANSI mode is enabled or not.
> I'm not sure if this is a DataFusion specific or a more general Arrow issue yet. It needs a discussion.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)