You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Francois Saint-Jacques (Jira)" <ji...@apache.org> on 2019/11/05 13:29:00 UTC

[jira] [Comment Edited] (ARROW-7047) [C++][Dataset] Filter expressions should not require exact type match

    [ https://issues.apache.org/jira/browse/ARROW-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967527#comment-16967527 ] 

Francois Saint-Jacques edited comment on ARROW-7047 at 11/5/19 1:28 PM:
------------------------------------------------------------------------

In typical databases, this is not the responsibility of physical operators. It is assumed that inputs are properly typed/casted by who ever generates the physical plan. I would propose that we create an interface to bind/validate/cast an expression to a Schema, e.g.


{code:c++}
/// \brief Bind an expression to a schema
///
/// Binding will try to align types of expressions and referenced fields. It will also checks that all references are valid.
Result<Expression> Expression::Bind(const Schema& schema, BindOptions options);
{code}

This utility could be used by high level languages and the planner (execution engine).


was (Author: fsaintjacques):
In typical databases, this is not the responsibility of physical operators. It is assumed that inputs are properly typed/casted by who ever generates the physical plan. I would propose that we create an interface to bind/validate/cast an expression to a Schema, e.g.


{code:c++}
/// \brief Bind an expression to a schema
///
/// Binding will try to align types of expressions and referenced fields. It will also checks that all references are valid.
Result<Expression> Expression::Bind(const Schema& schema, BindOptions options);
{code}

> [C++][Dataset] Filter expressions should not require exact type match
> ---------------------------------------------------------------------
>
>                 Key: ARROW-7047
>                 URL: https://issues.apache.org/jira/browse/ARROW-7047
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++ - Dataset
>            Reporter: Neal Richardson
>            Assignee: Ben Kietzman
>            Priority: Major
>
> It's not trivial for users to be able to ensure that scalars are of identical type to the fields they relate to in Expressions. For one, FieldExpressions don't contain a type reference, so at the time when I construct {{field_ref("col1") > scalar(42)}}, I don't know exactly what type col1 is to be able to ensure that scalar(42) matches. Even if it were available, I wouldn't be able to determine what type to make it if the expression were {{(field_ref("col1") + field_ref("col2")) > scalar(42)}}.
> We should allow CompareExpressions to cast the inputs as necessary. This should be among integer types and floating point types, and across integers and floats too. Likewise among date/timestamp types, and probably if comparing a string scalar against a date/timestamp column, the string should be parsed as a datetime. We also need to think about DictionaryTypes (though in practice this is moot until we have a comparison kernels that work on strings).
> [~fsaintjacques][~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)