You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Csaba Ringhofer (JIRA)" <ji...@apache.org> on 2017/12/01 17:35:00 UTC

[jira] [Created] (IMPALA-6266) Runtime filters should not have non-deterministic expression on consumer side

Csaba Ringhofer created IMPALA-6266:
---------------------------------------

             Summary: Runtime filters should not have non-deterministic expression on consumer side
                 Key: IMPALA-6266
                 URL: https://issues.apache.org/jira/browse/IMPALA-6266
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.10.0
            Reporter: Csaba Ringhofer


Random expressions on the consumer side of runtime filters are evaluated independently from the "final" join, which gives +1 chance for rows to be dropped. This means that the same query can return less or different rows if the runtime fiiter was used than if not.

Example:
use tpch_parquet;

set DISABLE_ROW_RUNTIME_FILTERING=0;
select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as int) = n_nationkey;
result: 9722

set DISABLE_ROW_RUNTIME_FILTERING=1;
select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as int) = n_nationkey;
result: 9803

( rand() is pseudo-random, so running the same query without changing to query option always returns the same result)

Optimizations like runtime filters should have no effect on the results, even in case of non-deterministic expressions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)