You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2016/08/26 03:02:21 UTC

[jira] [Comment Edited] (HIVE-14652) incorrect results for not in on partition columns

    [ https://issues.apache.org/jira/browse/HIVE-14652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438371#comment-15438371 ] 

Sergey Shelukhin edited comment on HIVE-14652 at 8/26/16 3:02 AM:
------------------------------------------------------------------

The fix (and also a refactor of the class to not have a million-line method).
I have a vague feeling that most of the logic in this method is  bogus, but it may be just because I am missing something, because it apparently works. The main question is, why do we evaluate UDFs on partition values from the pruned set for the filters that we purport to remove, if we have just used the same filters to prune the partitions, so one of the two should be true - either we cannot eliminate the filter, or the final result of all the expressions is known to be true (or not matter). So we'd insta-bail as soon as we'd see any disagreement after evaluation; or have a walk state that indicates the value doesn't matter.
I don't really know if that's the case or if I'm missing something here. 

So for now the fix is to change the new IN logic introduced by HIVE-11424 to follow the same twisted logic. 
Let's see what that breaks.

The problem is that HIVE-11424 changes IN to true if there's a column on the left side, but, as described above, this IN was used to filter the partitions, so in the NOT IN case, IN is guaranteed to be false. So, while the "regular" logic would have confirmed that and then applied NOT to the false constant, the current code  results in NOT being applied to the true constant.

cc [~jcamachorodriguez] [~ashutoshc]

EDIT: I think the old IN logic for UDF on the left hand side might also be broken the same way, need to take a look


was (Author: sershe):
The fix (and also a refactor of the class to not have a million-line method).
I have a vague feeling that most of the logic in this method is  bogus, but it may be just because I am missing something, because it apparently works. The main question is, why do we evaluate UDFs on partition values from the pruned set for the filters that we purport to remove, if we have just used the same filters to prune the partitions, so one of the two should be true - either we cannot eliminate the filter, or the final result of all the expressions is known to be true (or not matter). So we'd insta-bail as soon as we'd see any disagreement after evaluation; or have a walk state that indicates the value doesn't matter.
I don't really know if that's the case or if I'm missing something here. 

So for now the fix is to change the new IN logic introduced by HIVE-11424 to follow the same twisted logic. 
Let's see what that breaks.

The problem is that HIVE-11424 changes IN to true if there's a column on the left side, but, as described above, this IN was used to filter the partitions, so in the NOT IN case, IN is guaranteed to be false. So, while the "regular" logic would have confirmed that and then applied NOT to the false constant, the current code  results in NOT being applied to the true constant.

cc [~jcamachorodriguez] [~ashutoshc]

> incorrect results for not in on partition columns
> -------------------------------------------------
>
>                 Key: HIVE-14652
>                 URL: https://issues.apache.org/jira/browse/HIVE-14652
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: stephen sprague
>            Assignee: Sergey Shelukhin
>            Priority: Blocker
>         Attachments: HIVE-14652.patch
>
>
> {noformat}
> create table foo (i int) partitioned by (s string);
> insert overwrite table foo partition(s='foo') select cint from alltypesorc limit 10;
> insert overwrite table foo partition(s='bar') select cint from alltypesorc limit 10;
> select * from foo where s not in ('bar');
> {noformat}
> No results. IN ... works correctly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)