You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Yingyi Bu <bu...@gmail.com> on 2010/08/13 16:52:19 UTC
Filter Operator appear twice?
Hi,
Consider the following query:
INSERT OVERWRITE TABLE q1_pricing_summary_report
SELECT
L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY),
AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1)
FROM
lineitem
WHERE
L_SHIPDATE<='1998-09-02'
GROUP BY L_RETURNFLAG, L_LINESTATUS
ORDER BY L_RETURNFLAG, L_LIN
There is a segment in the query plan like this:
Filter Operator
predicate:
expr: (l_shipdate <= '1998-09-02')
type: boolean
Filter Operator
predicate:
expr: (l_shipdate <= '1998-09-02')
type: boolean
Why the filter operator repeatedly appear twice?
Similarly, for query
insert overwrite table q2_minimum_cost_supplier_tmp1
select
s.s_acctbal, s.s_name, n.n_name, p.p_partkey, ps.ps_supplycost,
p.p_mfgr, s.s_address, s.s_phone, s.s_comment
from
nation n join region r
on
n.n_regionkey = r.r_regionkey and r.r_name = 'EUROPE'
join supplier s
on
s.s_nationkey = n.n_nationkey
join partsupp ps
on
s.s_suppkey = ps.ps_suppkey
join part p
on
p.p_partkey = ps.ps_partkey and p.p_size = 15 and p.p_type like '%BRASS' ;
The filter operator also appear in the query multiple times.
Filter Operator
predicate:
expr: (p_type like '%BRASS')
type: boolean
Filter Operator
predicate:
expr: (p_size = 15)
type: boolean
Filter Operator
predicate:
expr: ((p_size = 15) and (p_type like '%BRASS'))
type: boolean
TableScan
alias: p
Thanks a lot!
Yingyi
Re: Filter Operator appear twice?
Posted by John Sichi <js...@facebook.com>.
Coincidentally, Amareshwari recently noticed the same thing on hive-dev and logged it here:
https://issues.apache.org/jira/browse/HIVE-1538
JVS
On Aug 13, 2010, at 8:21 AM, Namit Jain wrote:
> Currently, the hive optimizer tries to push the filter up, but probably does not remove the original filter.
> You can file a jira for that.
>
>
> Thanks,
> -namit
>
> ________________________________________
> From: Yingyi Bu [buyingyi@gmail.com]
> Sent: Friday, August 13, 2010 7:52 AM
> To: hive-user@hadoop.apache.org
> Subject: Filter Operator appear twice?
>
> Hi,
>
> Consider the following query:
>
> INSERT OVERWRITE TABLE q1_pricing_summary_report
> SELECT
> L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY),
> AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1)
> FROM
> lineitem
> WHERE
> L_SHIPDATE<='1998-09-02'
> GROUP BY L_RETURNFLAG, L_LINESTATUS
> ORDER BY L_RETURNFLAG, L_LIN
>
> There is a segment in the query plan like this:
> Filter Operator
> predicate:
> expr: (l_shipdate <= '1998-09-02')
> type: boolean
> Filter Operator
> predicate:
> expr: (l_shipdate <= '1998-09-02')
> type: boolean
>
>
> Why the filter operator repeatedly appear twice?
>
>
> Similarly, for query
> insert overwrite table q2_minimum_cost_supplier_tmp1
> select
> s.s_acctbal, s.s_name, n.n_name, p.p_partkey, ps.ps_supplycost,
> p.p_mfgr, s.s_address, s.s_phone, s.s_comment
> from
> nation n join region r
> on
> n.n_regionkey = r.r_regionkey and r.r_name = 'EUROPE'
> join supplier s
> on
> s.s_nationkey = n.n_nationkey
> join partsupp ps
> on
> s.s_suppkey = ps.ps_suppkey
> join part p
> on
> p.p_partkey = ps.ps_partkey and p.p_size = 15 and p.p_type like '%BRASS' ;
>
>
> The filter operator also appear in the query multiple times.
>
> Filter Operator
> predicate:
> expr: (p_type like '%BRASS')
> type: boolean
> Filter Operator
> predicate:
> expr: (p_size = 15)
> type: boolean
> Filter Operator
> predicate:
> expr: ((p_size = 15) and (p_type like '%BRASS'))
> type: boolean
> TableScan
> alias: p
>
>
> Thanks a lot!
>
> Yingyi
Re: Filter Operator appear twice?
Posted by Yingyi Bu <bu...@gmail.com>.
Ok, thanks a lot, Namit!
Yingyi
2010/8/13 Namit Jain <nj...@facebook.com>:
> Currently, the hive optimizer tries to push the filter up, but probably does not remove the original filter.
> You can file a jira for that.
>
>
> Thanks,
> -namit
>
> ________________________________________
> From: Yingyi Bu [buyingyi@gmail.com]
> Sent: Friday, August 13, 2010 7:52 AM
> To: hive-user@hadoop.apache.org
> Subject: Filter Operator appear twice?
>
> Hi,
>
> Consider the following query:
>
> INSERT OVERWRITE TABLE q1_pricing_summary_report
> SELECT
> L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY),
> AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1)
> FROM
> lineitem
> WHERE
> L_SHIPDATE<='1998-09-02'
> GROUP BY L_RETURNFLAG, L_LINESTATUS
> ORDER BY L_RETURNFLAG, L_LIN
>
> There is a segment in the query plan like this:
> Filter Operator
> predicate:
> expr: (l_shipdate <= '1998-09-02')
> type: boolean
> Filter Operator
> predicate:
> expr: (l_shipdate <= '1998-09-02')
> type: boolean
>
>
> Why the filter operator repeatedly appear twice?
>
>
> Similarly, for query
> insert overwrite table q2_minimum_cost_supplier_tmp1
> select
> s.s_acctbal, s.s_name, n.n_name, p.p_partkey, ps.ps_supplycost,
> p.p_mfgr, s.s_address, s.s_phone, s.s_comment
> from
> nation n join region r
> on
> n.n_regionkey = r.r_regionkey and r.r_name = 'EUROPE'
> join supplier s
> on
> s.s_nationkey = n.n_nationkey
> join partsupp ps
> on
> s.s_suppkey = ps.ps_suppkey
> join part p
> on
> p.p_partkey = ps.ps_partkey and p.p_size = 15 and p.p_type like '%BRASS' ;
>
>
> The filter operator also appear in the query multiple times.
>
> Filter Operator
> predicate:
> expr: (p_type like '%BRASS')
> type: boolean
> Filter Operator
> predicate:
> expr: (p_size = 15)
> type: boolean
> Filter Operator
> predicate:
> expr: ((p_size = 15) and (p_type like '%BRASS'))
> type: boolean
> TableScan
> alias: p
>
>
> Thanks a lot!
>
> Yingyi
>
RE: Filter Operator appear twice?
Posted by Namit Jain <nj...@facebook.com>.
Currently, the hive optimizer tries to push the filter up, but probably does not remove the original filter.
You can file a jira for that.
Thanks,
-namit
________________________________________
From: Yingyi Bu [buyingyi@gmail.com]
Sent: Friday, August 13, 2010 7:52 AM
To: hive-user@hadoop.apache.org
Subject: Filter Operator appear twice?
Hi,
Consider the following query:
INSERT OVERWRITE TABLE q1_pricing_summary_report
SELECT
L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY),
AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1)
FROM
lineitem
WHERE
L_SHIPDATE<='1998-09-02'
GROUP BY L_RETURNFLAG, L_LINESTATUS
ORDER BY L_RETURNFLAG, L_LIN
There is a segment in the query plan like this:
Filter Operator
predicate:
expr: (l_shipdate <= '1998-09-02')
type: boolean
Filter Operator
predicate:
expr: (l_shipdate <= '1998-09-02')
type: boolean
Why the filter operator repeatedly appear twice?
Similarly, for query
insert overwrite table q2_minimum_cost_supplier_tmp1
select
s.s_acctbal, s.s_name, n.n_name, p.p_partkey, ps.ps_supplycost,
p.p_mfgr, s.s_address, s.s_phone, s.s_comment
from
nation n join region r
on
n.n_regionkey = r.r_regionkey and r.r_name = 'EUROPE'
join supplier s
on
s.s_nationkey = n.n_nationkey
join partsupp ps
on
s.s_suppkey = ps.ps_suppkey
join part p
on
p.p_partkey = ps.ps_partkey and p.p_size = 15 and p.p_type like '%BRASS' ;
The filter operator also appear in the query multiple times.
Filter Operator
predicate:
expr: (p_type like '%BRASS')
type: boolean
Filter Operator
predicate:
expr: (p_size = 15)
type: boolean
Filter Operator
predicate:
expr: ((p_size = 15) and (p_type like '%BRASS'))
type: boolean
TableScan
alias: p
Thanks a lot!
Yingyi