You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Owen O'Malley (Jira)" <ji...@apache.org> on 2020/08/31 21:07:00 UTC

[jira] [Commented] (ORC-623) Potentially incorrect Sarg evaluation for not(in) and not(isNull)

    [ https://issues.apache.org/jira/browse/ORC-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188003#comment-17188003 ] 

Owen O'Malley commented on ORC-623:
-----------------------------------

Thank you very much for the bug report with the unit test cases. That helped a lot. The first case, we weren't handling some of the types with all null values correctly. The second case is actually a typical misunderstanding of how null works in SQL.

When int1 is null, "not int1 in (1)" returns null. So, the predicate is returns either false or null and thus no rows should be returned. For the record, it took me a long time to get it straight and I still get it wrong more than I'd like to admit.

> Potentially incorrect Sarg evaluation for not(in) and not(isNull)
> -----------------------------------------------------------------
>
>                 Key: ORC-623
>                 URL: https://issues.apache.org/jira/browse/ORC-623
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Shardul Mahadik
>            Priority: Major
>
> I seem to have stumbled upon two issues with respect to Sarg evaluation in ORC
> I have created two test cases at [https://github.com/shardulm94/orc/commit/b6d97cfa0325d2a14094456d338c942f61b887f2] for the same
> In the first case, applying {{not(isNull(column))}} on a column that has all null values seems to incorrectly mark the row group as needed. This is a rather benign issue though as some extra row groups are returned.
> In the second case, I create a column which has only 2 potential values, either null or 1 based on whether the row index is even or odd. So all row groups are guaranteed to have both null and 1. Applying {{not(in(column, 1))}} on this column incorrectly marks the row group as not needed. There are null values in the row group which should be matched by {{notIn(column, 1)}}. This is potentially causing some row groups to be filtered out incorrectly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)