You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2010/08/13 09:55:16 UTC
[jira] Commented: (HIVE-1538) FilterOperator is applied twice with
ppd on.
[ https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898120#action_12898120 ]
Amareshwari Sriramadasu commented on HIVE-1538:
-----------------------------------------------
I see that if a query has where clause, the FilterOperator is applied twice.
Explain on a query with where clause :
hive> explain select * from input1 where input1.key != 10;
{noformat}
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF input1)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (!= (. (TOK_TABLE_OR_COL input1) key) 10))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
input1
TableScan
alias: input1
Filter Operator
predicate:
expr: (key <> 10)
type: boolean
Filter Operator
predicate:
expr: (key <> 10)
type: boolean
Select Operator
expressions:
expr: key
type: int
expr: value
type: int
outputColumnNames: _col0, _col1
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
Time taken: 0.099 seconds
{noformat}
I see the same from the Mapper logs also. The first FilterOperator does the
filtering and second operator always filters zero rows.
{noformat}
....
2010-08-13 13:20:21,451 INFO ExecMapper:
<MAP>Id =5
<Children>
<TS>Id =0
<Children>
<FIL>Id =1
<Children>
<FIL>Id =2
<Children>
<SEL>Id =3
<Children>
<FS>Id =4
<Parent>Id = 3 null<\Parent>
<\FS>
<\Children>
<Parent>Id = 2 null<\Parent>
<\SEL>
<\Children>
<Parent>Id = 1 null<\Parent>
<\FIL>
<\Children>
<Parent>Id = 0 null<\Parent>
<\FIL>
<\Children>
<Parent>Id = 5 null<\Parent>
<\TS>
<\Children>
<\MAP>
...
2010-08-13 13:20:21,489 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 forwarding 1 rows
2010-08-13 13:20:21,489 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows
2010-08-13 13:20:21,600 INFO ExecMapper: ExecMapper: processing 1 rows: used memory = 10765360
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 forwarded 1 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 1 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:1
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:0
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing...
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 forwarded 0 rows
2010-08-13 13:20:21,600 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/000000_0
2010-08-13 13:20:21,601 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/_tmp.000000_0
2010-08-13 13:20:21,604 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-13_13-20-11_483_2065579562420016208/_tmp.-ext-10001/000000_0
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2010-08-13 13:20:21,629 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 5 Close done
2010-08-13 13:20:21,629 INFO ExecMapper: ExecMapper: processed 1 rows: used memory = 11454224
...
{noformat}
> FilterOperator is applied twice with ppd on.
> --------------------------------------------
>
> Key: HIVE-1538
> URL: https://issues.apache.org/jira/browse/HIVE-1538
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Amareshwari Sriramadasu
>
> With hive.optimize.ppd set to true, FilterOperator is applied twice. And it seems second operator is always filtering zero rows.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.