You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jason Dere (JIRA)" <ji...@apache.org> on 2017/02/23 06:52:48 UTC
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in
MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879999#comment-15879999 ]
Jason Dere commented on HIVE-16022:
-----------------------------------
Noticed a couple of problems when I run the semijoin optimization on a MERGE statement:
- DynamicPartitionPruningOptimization.generateSemiJoinOperator(): parentOfRS does not necessarily have to be a SelectOperator - in this case it is a TS. As a result we are missing some important checking on whether this table is appropriate for semijoin opt.
- grandParent.getChildren().add(bloomFilterNode) - This wrongly assumes grandParent is AND: In this case, there was no previous filterExpr so grandParent is BETWEEN. Adding the child here incorrectly adds a new parameter to BETWEEN , which is probably getting ignored. This is why in_bloom_filter() is not in the EXPLAIN.
> BloomFilter check not showing up in MERGE statement queries
> -----------------------------------------------------------
>
> Key: HIVE-16022
> URL: https://issues.apache.org/jira/browse/HIVE-16022
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Reporter: Jason Dere
> Assignee: Jason Dere
> Attachments: HIVE-16022.1.patch
>
>
> Running explain on a MERGE statement with runtime filtering enabled, I see the min/max being applied on the large table, but not the bloom filter check:
> {noformat}
> explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a
> WHEN MATCHED AND s.a > 8 THEN DELETE
> WHEN MATCHED THEN UPDATE SET b = 7
> WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b)
> ...
> Map 1
> Map Operator Tree:
> TableScan
> alias: t
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Filter Operator
> predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND DynamicValue(RS_3_s_a_max) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)