You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Kunal Khatua (JIRA)" <ji...@apache.org> on 2018/11/14 00:01:00 UTC
[jira] [Commented] (DRILL-6849) Runtime filter queries with nested
broadcast returns wrong results
[ https://issues.apache.org/jira/browse/DRILL-6849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685921#comment-16685921 ]
Kunal Khatua commented on DRILL-6849:
-------------------------------------
[~weijie]
I'm seeing a similar issue with a simple count over a join as well, with excess number of rows getting filtered. (SF100):
{code}
0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, supplier s where l.l_suppkey = s.s_suppkey and s.s_acctbal <1000);
+------------+
| EXPR$0 |
+------------+
| 109307880 |
+------------+
1 row selected (10.645 seconds)
0: jdbc:drill:schema=dfs.par100> alter session set `exec.hashjoin.enable.runtime_filter`=true;
+-------+-----------------------------------------------+
| ok | summary |
+-------+-----------------------------------------------+
| true | exec.hashjoin.enable.runtime_filter updated. |
+-------+-----------------------------------------------+
1 row selected (0.275 seconds)
0: jdbc:drill:schema=dfs.par100> alter $<3>session set `exec.hashjoin.runtime_filter.max.waiting.time`=1000;
+-------+---------------------------------------------------------+
| ok | summary |
+-------+---------------------------------------------------------+
| true | exec.hashjoin.runtime_filter.max.waiting.time updated. |
+-------+---------------------------------------------------------+
1 row selected (0.155 seconds)
0: jdbc:drill:schema=dfs.par100> alter session set `exec.hashjoin.bloom_filter.fpp`=0.2;
+-------+------------------------------------------+
| ok | summary |
+-------+------------------------------------------+
| true | exec.hashjoin.bloom_filter.fpp updated. |
+-------+------------------------------------------+
1 row selected (0.142 seconds)
0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, supplier s where l.l_suppkey = s.s_suppkey and s.s_acctbal <1000);
+---------+
| EXPR$0 |
+---------+
| 405870 |
+---------+
1 row selected (9.498 seconds)
0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, supplier s where l.l_suppkey = s.s_suppkey and s.s_acctbal <1000);
+---------+
| EXPR$0 |
+---------+
| 405939 |
+---------+
1 row selected (9.617 seconds)
0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, supplier s where l.l_suppkey = s.s_suppkey and s.s_acctbal <1000);
+---------+
| EXPR$0 |
+---------+
| 405794 |
+---------+
1 row selected (8.801 seconds)
{code}
> Runtime filter queries with nested broadcast returns wrong results
> ------------------------------------------------------------------
>
> Key: DRILL-6849
> URL: https://issues.apache.org/jira/browse/DRILL-6849
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.15.0
> Reporter: Abhishek Ravi
> Assignee: weijie.tong
> Priority: Major
> Fix For: 1.15.0
>
>
> Running few queries on TPC-H SF100 data with latest changes in [PR #1504|[https://github.com/apache/drill/pull/1504].]
> I noticed that for a couple of queries, there are vast discrepancies in the results returned by queries with *Runtime Filter enabled* and without.
> Below are a couple of failing queries.
> h3. Query 1
> {code:sql}
> select
> p.p_mfgr,
> p.p_type,
> count(*) as num_parts
> from
> supplier s,
> part p,
> partsupp ps,
> nation n
> where
> n.n_nationkey = 15
> and n.n_nationkey = s.s_nationkey
> and s.s_suppkey = ps.ps_suppkey
> and ps.ps_partkey = p.p_partkey
> and p.p_type like '%STEEL%'
> group by
> p.p_mfgr,
> p.p_type
> order by
> p.p_mfgr,
> p.p_type;
> {code}
>
> h4. Expected Result (without Runtime filter)
> {noformat}
> +-----------------+---------------------------+------------+
> | p_mfgr | p_type | num_parts |
> +-----------------+---------------------------+------------+
> | Manufacturer#1 | ECONOMY ANODIZED STEEL | 4448 |
> | Manufacturer#1 | ECONOMY BRUSHED STEEL | 4308 |
> | Manufacturer#1 | ECONOMY BURNISHED STEEL | 4268 |
> | Manufacturer#1 | ECONOMY PLATED STEEL | 4266 |
> | Manufacturer#1 | ECONOMY POLISHED STEEL | 4302 |
> | Manufacturer#1 | LARGE ANODIZED STEEL | 4389 |
> | Manufacturer#1 | LARGE BRUSHED STEEL | 4254 |
> | Manufacturer#1 | LARGE BURNISHED STEEL | 4276 |
> | Manufacturer#1 | LARGE PLATED STEEL | 4351 |
> | Manufacturer#1 | LARGE POLISHED STEEL | 4281 |
> | Manufacturer#1 | MEDIUM ANODIZED STEEL | 4230 |
> | Manufacturer#1 | MEDIUM BRUSHED STEEL | 4247 |
> | Manufacturer#1 | MEDIUM BURNISHED STEEL | 4286 |
> | Manufacturer#1 | MEDIUM PLATED STEEL | 4269 |
> | Manufacturer#1 | MEDIUM POLISHED STEEL | 4274 |
> | Manufacturer#1 | PROMO ANODIZED STEEL | 4283 |
> | Manufacturer#1 | PROMO BRUSHED STEEL | 4221 |
> | Manufacturer#1 | PROMO BURNISHED STEEL | 4315 |
> | Manufacturer#1 | PROMO PLATED STEEL | 4361 |
> | Manufacturer#1 | PROMO POLISHED STEEL | 4213 |
> | Manufacturer#1 | SMALL ANODIZED STEEL | 4354 |
> | Manufacturer#1 | SMALL BRUSHED STEEL | 4159 |
> | Manufacturer#1 | SMALL BURNISHED STEEL | 4222 |
> ...
> ...
> 150 rows
> {noformat}
>
> h4. Actual Result
> {noformat}
> +-----------------+---------------------------+------------+
> | p_mfgr | p_type | num_parts |
> +-----------------+---------------------------+------------+
> | Manufacturer#1 | ECONOMY ANODIZED STEEL | 63 |
> | Manufacturer#1 | ECONOMY BRUSHED STEEL | 64 |
> | Manufacturer#1 | ECONOMY BURNISHED STEEL | 58 |
> | Manufacturer#1 | ECONOMY PLATED STEEL | 73 |
> | Manufacturer#1 | ECONOMY POLISHED STEEL | 59 |
> | Manufacturer#1 | LARGE ANODIZED STEEL | 60 |
> | Manufacturer#1 | LARGE BRUSHED STEEL | 62 |
> | Manufacturer#1 | LARGE BURNISHED STEEL | 47 |
> | Manufacturer#1 | LARGE PLATED STEEL | 51 |
> | Manufacturer#1 | LARGE POLISHED STEEL | 60 |
> | Manufacturer#1 | MEDIUM ANODIZED STEEL | 54 |
> | Manufacturer#1 | MEDIUM BRUSHED STEEL | 60 |
> | Manufacturer#1 | MEDIUM BURNISHED STEEL | 53 |
> | Manufacturer#1 | MEDIUM PLATED STEEL | 73 |
> | Manufacturer#1 | MEDIUM POLISHED STEEL | 65 |
> | Manufacturer#1 | PROMO ANODIZED STEEL | 71 |
> | Manufacturer#1 | PROMO BRUSHED STEEL | 67 |
> | Manufacturer#1 | PROMO BURNISHED STEEL | 64 |
> | Manufacturer#1 | PROMO PLATED STEEL | 47 |
> | Manufacturer#1 | PROMO POLISHED STEEL | 51 |
> | Manufacturer#1 | SMALL ANODIZED STEEL | 48 |
> | Manufacturer#1 | SMALL BRUSHED STEEL | 71 |
> | Manufacturer#1 | SMALL BURNISHED STEEL | 33 |
> ...
> ...
> 150 rows
> {noformat}
>
> h3. Query 2 (TPC-H 7 query)
> {code:sql}
> select
> supp_nation,
> cust_nation,
> l_year,
> sum(volume) as revenue
> from
> (
> select
> n1.n_name as supp_nation,
> n2.n_name as cust_nation,
> extract(year from l.l_shipdate) as l_year,
> l.l_extendedprice * (1 - l.l_discount) as volume
> from
> supplier s,
> lineitem l,
> orders o,
> customer c,
> nation n1,
> nation n2
> where
> s.s_suppkey = l.l_suppkey
> and o.o_orderkey = l.l_orderkey
> and c.c_custkey = o.o_custkey
> and s.s_nationkey = n1.n_nationkey
> and c.c_nationkey = n2.n_nationkey
> and (
> (n1.n_name = 'EGYPT' and n2.n_name = 'UNITED STATES')
> or (n1.n_name = 'UNITED STATES' and n2.n_name = 'EGYPT')
> )
> and l.l_shipdate between date '1995-01-01' and date '1996-12-31'
> ) as shipping
> group by
> supp_nation,
> cust_nation,
> l_year
> order by
> supp_nation,
> cust_nation,
> l_year;
> {code}
> h4. Expected Result
> {noformat}
> EGYPT UNITED STATES 1996 5.2735809932832E9
> UNITED STATES EGYPT 1996 5.320488357330402E9
> EGYPT UNITED STATES 1995 5.282512709079098E9
> UNITED STATES EGYPT 1995 5.3213732978949E9
> {noformat}
> h4. Actual Result
> {noformat}
> EGYPT UNITED STATES 1996 1.6282961704599997E7
> UNITED STATES EGYPT 1996 1.39809230059E7
> EGYPT UNITED STATES 1995 1.3606895522500003E7
> UNITED STATES EGYPT 1995 1.5241044473299999E7
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)