You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Yingyi Bu <bu...@gmail.com> on 2010/08/13 16:52:19 UTC

Filter Operator appear twice?

Hi,

Consider the following query:

INSERT OVERWRITE TABLE q1_pricing_summary_report
SELECT
L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY),
AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1)
FROM
  lineitem
WHERE
  L_SHIPDATE<='1998-09-02'
  GROUP BY L_RETURNFLAG, L_LINESTATUS
  ORDER BY L_RETURNFLAG, L_LIN

There is a segment in the query plan like this:
Filter Operator
       predicate:
             expr: (l_shipdate <= '1998-09-02')
             type: boolean
                      Filter Operator
                                predicate:
                                   expr: (l_shipdate <= '1998-09-02')
                                    type: boolean


Why the filter operator repeatedly appear twice?


Similarly, for query
insert overwrite table q2_minimum_cost_supplier_tmp1
select
  s.s_acctbal, s.s_name, n.n_name, p.p_partkey, ps.ps_supplycost,
p.p_mfgr, s.s_address, s.s_phone, s.s_comment
from
  nation n join region r
  on
    n.n_regionkey = r.r_regionkey and r.r_name = 'EUROPE'
  join supplier s
  on
s.s_nationkey = n.n_nationkey
  join partsupp ps
  on
s.s_suppkey = ps.ps_suppkey
  join part p
  on
    p.p_partkey = ps.ps_partkey and p.p_size = 15 and p.p_type like '%BRASS' ;


The filter operator also appear in the query multiple times.

 Filter Operator
        predicate:
            expr: (p_type like '%BRASS')
             type: boolean
                     Filter Operator
                        predicate:
                          expr: (p_size = 15)
                          type: boolean
                         Filter Operator
                           predicate:
                              expr: ((p_size = 15) and (p_type like '%BRASS'))
                              type: boolean
                            TableScan
                              alias: p


Thanks a lot!

Yingyi

Re: Filter Operator appear twice?

Posted by John Sichi <js...@facebook.com>.
Coincidentally, Amareshwari recently noticed the same thing on hive-dev and logged it here:

https://issues.apache.org/jira/browse/HIVE-1538

JVS

On Aug 13, 2010, at 8:21 AM, Namit Jain wrote:

> Currently, the hive optimizer tries to  push the filter up, but probably does not remove the original filter.
> You can file a jira for that.
> 
> 
> Thanks,
> -namit
> 
> ________________________________________
> From: Yingyi Bu [buyingyi@gmail.com]
> Sent: Friday, August 13, 2010 7:52 AM
> To: hive-user@hadoop.apache.org
> Subject: Filter Operator appear twice?
> 
> Hi,
> 
> Consider the following query:
> 
> INSERT OVERWRITE TABLE q1_pricing_summary_report
> SELECT
> L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY),
> AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1)
> FROM
>  lineitem
> WHERE
>  L_SHIPDATE<='1998-09-02'
>  GROUP BY L_RETURNFLAG, L_LINESTATUS
>  ORDER BY L_RETURNFLAG, L_LIN
> 
> There is a segment in the query plan like this:
> Filter Operator
>       predicate:
>             expr: (l_shipdate <= '1998-09-02')
>             type: boolean
>                      Filter Operator
>                                predicate:
>                                   expr: (l_shipdate <= '1998-09-02')
>                                    type: boolean
> 
> 
> Why the filter operator repeatedly appear twice?
> 
> 
> Similarly, for query
> insert overwrite table q2_minimum_cost_supplier_tmp1
> select
>  s.s_acctbal, s.s_name, n.n_name, p.p_partkey, ps.ps_supplycost,
> p.p_mfgr, s.s_address, s.s_phone, s.s_comment
> from
>  nation n join region r
>  on
>    n.n_regionkey = r.r_regionkey and r.r_name = 'EUROPE'
>  join supplier s
>  on
> s.s_nationkey = n.n_nationkey
>  join partsupp ps
>  on
> s.s_suppkey = ps.ps_suppkey
>  join part p
>  on
>    p.p_partkey = ps.ps_partkey and p.p_size = 15 and p.p_type like '%BRASS' ;
> 
> 
> The filter operator also appear in the query multiple times.
> 
> Filter Operator
>        predicate:
>            expr: (p_type like '%BRASS')
>             type: boolean
>                     Filter Operator
>                        predicate:
>                          expr: (p_size = 15)
>                          type: boolean
>                         Filter Operator
>                           predicate:
>                              expr: ((p_size = 15) and (p_type like '%BRASS'))
>                              type: boolean
>                            TableScan
>                              alias: p
> 
> 
> Thanks a lot!
> 
> Yingyi


Re: Filter Operator appear twice?

Posted by Yingyi Bu <bu...@gmail.com>.
Ok, thanks a lot, Namit!

Yingyi

2010/8/13 Namit Jain <nj...@facebook.com>:
> Currently, the hive optimizer tries to  push the filter up, but probably does not remove the original filter.
> You can file a jira for that.
>
>
> Thanks,
> -namit
>
> ________________________________________
> From: Yingyi Bu [buyingyi@gmail.com]
> Sent: Friday, August 13, 2010 7:52 AM
> To: hive-user@hadoop.apache.org
> Subject: Filter Operator appear twice?
>
> Hi,
>
> Consider the following query:
>
> INSERT OVERWRITE TABLE q1_pricing_summary_report
> SELECT
> L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
> SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY),
> AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1)
> FROM
>  lineitem
> WHERE
>  L_SHIPDATE<='1998-09-02'
>  GROUP BY L_RETURNFLAG, L_LINESTATUS
>  ORDER BY L_RETURNFLAG, L_LIN
>
> There is a segment in the query plan like this:
> Filter Operator
>       predicate:
>             expr: (l_shipdate <= '1998-09-02')
>             type: boolean
>                      Filter Operator
>                                predicate:
>                                   expr: (l_shipdate <= '1998-09-02')
>                                    type: boolean
>
>
> Why the filter operator repeatedly appear twice?
>
>
> Similarly, for query
> insert overwrite table q2_minimum_cost_supplier_tmp1
> select
>  s.s_acctbal, s.s_name, n.n_name, p.p_partkey, ps.ps_supplycost,
> p.p_mfgr, s.s_address, s.s_phone, s.s_comment
> from
>  nation n join region r
>  on
>    n.n_regionkey = r.r_regionkey and r.r_name = 'EUROPE'
>  join supplier s
>  on
> s.s_nationkey = n.n_nationkey
>  join partsupp ps
>  on
> s.s_suppkey = ps.ps_suppkey
>  join part p
>  on
>    p.p_partkey = ps.ps_partkey and p.p_size = 15 and p.p_type like '%BRASS' ;
>
>
> The filter operator also appear in the query multiple times.
>
>  Filter Operator
>        predicate:
>            expr: (p_type like '%BRASS')
>             type: boolean
>                     Filter Operator
>                        predicate:
>                          expr: (p_size = 15)
>                          type: boolean
>                         Filter Operator
>                           predicate:
>                              expr: ((p_size = 15) and (p_type like '%BRASS'))
>                              type: boolean
>                            TableScan
>                              alias: p
>
>
> Thanks a lot!
>
> Yingyi
>

RE: Filter Operator appear twice?

Posted by Namit Jain <nj...@facebook.com>.
Currently, the hive optimizer tries to  push the filter up, but probably does not remove the original filter.
You can file a jira for that.


Thanks,
-namit

________________________________________
From: Yingyi Bu [buyingyi@gmail.com]
Sent: Friday, August 13, 2010 7:52 AM
To: hive-user@hadoop.apache.org
Subject: Filter Operator appear twice?

Hi,

Consider the following query:

INSERT OVERWRITE TABLE q1_pricing_summary_report
SELECT
L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)),
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)), AVG(L_QUANTITY),
AVG(L_EXTENDEDPRICE), AVG(L_DISCOUNT), COUNT(1)
FROM
  lineitem
WHERE
  L_SHIPDATE<='1998-09-02'
  GROUP BY L_RETURNFLAG, L_LINESTATUS
  ORDER BY L_RETURNFLAG, L_LIN

There is a segment in the query plan like this:
Filter Operator
       predicate:
             expr: (l_shipdate <= '1998-09-02')
             type: boolean
                      Filter Operator
                                predicate:
                                   expr: (l_shipdate <= '1998-09-02')
                                    type: boolean


Why the filter operator repeatedly appear twice?


Similarly, for query
insert overwrite table q2_minimum_cost_supplier_tmp1
select
  s.s_acctbal, s.s_name, n.n_name, p.p_partkey, ps.ps_supplycost,
p.p_mfgr, s.s_address, s.s_phone, s.s_comment
from
  nation n join region r
  on
    n.n_regionkey = r.r_regionkey and r.r_name = 'EUROPE'
  join supplier s
  on
s.s_nationkey = n.n_nationkey
  join partsupp ps
  on
s.s_suppkey = ps.ps_suppkey
  join part p
  on
    p.p_partkey = ps.ps_partkey and p.p_size = 15 and p.p_type like '%BRASS' ;


The filter operator also appear in the query multiple times.

 Filter Operator
        predicate:
            expr: (p_type like '%BRASS')
             type: boolean
                     Filter Operator
                        predicate:
                          expr: (p_size = 15)
                          type: boolean
                         Filter Operator
                           predicate:
                              expr: ((p_size = 15) and (p_type like '%BRASS'))
                              type: boolean
                            TableScan
                              alias: p


Thanks a lot!

Yingyi