You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Swarnim Kulkarni (JIRA)" <ji...@apache.org> on 2015/08/30 07:27:45 UTC

[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work

    [ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721391#comment-14721391 ] 

Swarnim Kulkarni commented on HIVE-11609:
-----------------------------------------

Here are results from my testing with and without this patch applied. The table "my_table" for this testing contains about 8 M rows.

*Restrict query by single key*:

Example query: select * from my_table where key.firstpart="something";

|| Memory(in MB) || With patch || Without patch ||
| 1500 | Out of memory | Out of memory |
| 3000 | 2.5 minutes | Out of memory |
| 6000 | 2.4 minutes | 23 minutes |

*Restrict query by multiple key*: (Note that the key parts must be successive for this to work)

Example query: select * from my_table where key.firstpart="something" and key.secondpart="something2";

|| Memory(in MB) || With filter || Without filter ||
| 1500 | 23 sec | Out of memory |
| 3000 | 19 sec | Out of memory |
| 6000 | 18.8 sec | 24 minutes |

So we restrict our filter and get more efficient depending as we get more detailed and deeper with the query. To toggle between using filter and not using it, I set the hive.optimize.ppd.storage flag to false so no predicate pushdown happens.

Finally query without M/R job:

*Restrict query by multiple key*: (No M/R job)

Example query: select * from my_table where key.firstpart="something" and key.secondpart="something2";

|| Memory(in MB) || With filter || Without filter ||
| 3000 | 5 sec | 19 minutes |

> Capability to add a filter to hbase scan via composite key doesn't work
> -----------------------------------------------------------------------
>
>                 Key: HIVE-11609
>                 URL: https://issues.apache.org/jira/browse/HIVE-11609
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Swarnim Kulkarni
>            Assignee: Swarnim Kulkarni
>         Attachments: HIVE-11609.1.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added as part of HIVE-6411 doesn't work. This is primarily because in the HiveHBaseInputFormat, the filter is added in the getsplits instead of getrecordreader. This works fine for start and stop keys but not for filter because a filter is respected only when an actual scan is performed. This is also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)