You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Csaba Ringhofer (JIRA)" <ji...@apache.org> on 2017/12/01 17:35:00 UTC
[jira] [Created] (IMPALA-6266) Runtime filters should not have
non-deterministic expression on consumer side
Csaba Ringhofer created IMPALA-6266:
---------------------------------------
Summary: Runtime filters should not have non-deterministic expression on consumer side
Key: IMPALA-6266
URL: https://issues.apache.org/jira/browse/IMPALA-6266
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 2.10.0
Reporter: Csaba Ringhofer
Random expressions on the consumer side of runtime filters are evaluated independently from the "final" join, which gives +1 chance for rows to be dropped. This means that the same query can return less or different rows if the runtime fiiter was used than if not.
Example:
use tpch_parquet;
set DISABLE_ROW_RUNTIME_FILTERING=0;
select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as int) = n_nationkey;
result: 9722
set DISABLE_ROW_RUNTIME_FILTERING=1;
select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as int) = n_nationkey;
result: 9803
( rand() is pseudo-random, so running the same query without changing to query option always returns the same result)
Optimizations like runtime filters should have no effect on the results, even in case of non-deterministic expressions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)