You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2010/08/13 09:55:16 UTC

[jira] Commented: (HIVE-1538) FilterOperator is applied twice with ppd on.

    [ https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898120#action_12898120 ] 

Amareshwari Sriramadasu commented on HIVE-1538:
-----------------------------------------------

I see that if a query has where clause, the FilterOperator is applied twice.

Explain on a query with where clause :
hive> explain select * from input1 where input1.key != 10;
{noformat}
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF input1)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (!= (. (TOK_TABLE_OR_COL input1) key) 10))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        input1
          TableScan
            alias: input1
            Filter Operator
              predicate:
                  expr: (key <> 10)
                  type: boolean
              Filter Operator
                predicate:
                    expr: (key <> 10)
                    type: boolean
                Select Operator
                  expressions:
                        expr: key
                        type: int
                        expr: value
                        type: int
                  outputColumnNames: _col0, _col1
                  File Output Operator
                    compressed: false
                    GlobalTableId: 0
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1
Time taken: 0.099 seconds
{noformat}

I see the same from the Mapper logs also. The first FilterOperator does the
filtering and second operator always filters zero rows.

{noformat}
....
2010-08-13 13:20:21,451 INFO ExecMapper: 
<MAP>Id =5
  <Children>
    <TS>Id =0
      <Children>
        <FIL>Id =1
          <Children>
            <FIL>Id =2
              <Children>
                <SEL>Id =3
                  <Children>
                    <FS>Id =4
                      <Parent>Id = 3 null<\Parent>
                    <\FS>
                  <\Children>
                  <Parent>Id = 2 null<\Parent>
                <\SEL>
              <\Children>
              <Parent>Id = 1 null<\Parent>
            <\FIL>
          <\Children>
          <Parent>Id = 0 null<\Parent>
        <\FIL>
      <\Children>
      <Parent>Id = 5 null<\Parent>
    <\TS>
  <\Children>
<\MAP>
...
2010-08-13 13:20:21,489 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 forwarding 1 rows
2010-08-13 13:20:21,489 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows
2010-08-13 13:20:21,600 INFO ExecMapper: ExecMapper: processing 1 rows: used memory = 10765360
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 forwarded 1 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 1 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:1
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... 
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/000000_0
2010-08-13 13:20:21,601 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/_tmp.000000_0
2010-08-13 13:20:21,604 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/000000_0
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 Close done
2010-08-13 13:20:21,629 INFO ExecMapper: ExecMapper: processed 1 rows: used memory = 11454224
...
{noformat}


> FilterOperator is applied twice with ppd on.
> --------------------------------------------
>
>                 Key: HIVE-1538
>                 URL: https://issues.apache.org/jira/browse/HIVE-1538
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>
> With hive.optimize.ppd set to true, FilterOperator is applied twice. And it seems second operator is always filtering zero rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.