You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2016/10/08 18:46:20 UTC

[jira] [Closed] (SPARK-10703) Physical filter operators should replace the general AND/OR/equality/etc with a special version that treats null as false

     [ https://issues.apache.org/jira/browse/SPARK-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiao Li closed SPARK-10703.
---------------------------
    Resolution: Not A Problem

> Physical filter operators should replace the general AND/OR/equality/etc with a special version that treats null as false
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10703
>                 URL: https://issues.apache.org/jira/browse/SPARK-10703
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Mingyu Kim
>
> {noformat}
> val df = Seq(("moose","ice"), (null,"fire")).toDF("animals", "elements")
> df.filter($"animals".rlike(".*"))
>   .filter(callUDF({(value: String) => value.length > 2}, BooleanType, $"animals"))
>   .collect()
> {noformat}
> This code throws a NPE because:
> * Catalyst combines the filters with an AND
> * the first filter passes returns null on the first input
> * the second filter tries to read the length of that null
> This feels weird. Reading that code, I wouldn't expect null to be passed to the second filter. Even weirder is that if you call collect() after the first filter you won't see nulls, and if you write the data to disk and reread it, the NPE won't happen.
> After the discussion on the dev list, [~rxin] suggested,
> {quote}
> we can add a rule for the physical filter operator to replace the general AND/OR/equality/etc with a special version that treats null as false. This rule needs to be carefully written because it should only apply to subtrees of AND/OR/equality/etc (e.g. it shouldn't rewrite children of isnull).
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org