You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "He Yongqiang (JIRA)" <ji...@apache.org> on 2011/03/02 01:29:37 UTC

[jira] Commented: (HIVE-1644) use filter pushdown for automatically accessing indexes

    [ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001208#comment-13001208 ] 

He Yongqiang commented on HIVE-1644:
------------------------------------

did a quick look at the HIVE-1644.4.patch itself. 

some comments:
1) add testcase for combinehiveinputformat
2) in the new testcase, the newly added conf "hive.optimize.autoindex" is not used?
3) I think there already is an api in Hive.java for getting all indexes on a table, No? Please double check.. If not, rename getIndexesOnTable to getIndexes
4) in GenMRTableScan1.java, it is not good to hardcode the inputformat name. why not just use indexClassName?
5) in ExecDriver.java, it is also not good here to hardcode the conf name "hive.index.compact.file", because bitmap index may want to use a different name. So maybe should pass these work to some index type specific class
6) in the generateIndexQuery, the temp directory is not a random, so could conflict with others (in the same query), and the dir path should not be generated there, should be generated in the optimizer which can have global control. And if i think "insert overwrite directory 'full_path_to_a_dir' select .." would fail if the full_path_to_a_dir does not exist (or its parent does not exist). please check here
7) In the genereateIndexQuery, what is this used for?
+    ParseContext indexQueryPctx = RewriteParseContextGenerator.generateOperatorTree(pctx.getConf(), qlCommand);


And today the index optimizer is before the breaking task tree. So the index scan task is generated before the task for original table scan. so it is very hard to hook them together. The only i can think is to remember the op id for the original table scan, and do another process to hook them together after breaking task tree. But i think it is too hack.

Maybe a better way to do it is in the physical optimizer. In physical optimizer, hive presents a task tree. and the optimizer can go through each task, and do the same thing (since each task has the same operator tree). And it will be much easier for managing task dependency. And i think most code will be the same. And for complex queries, this approach will be cleaner.


> use filter pushdown for automatically accessing indexes
> -------------------------------------------------------
>
>                 Key: HIVE-1644
>                 URL: https://issues.apache.org/jira/browse/HIVE-1644
>             Project: Hive
>          Issue Type: Improvement
>          Components: Indexing
>    Affects Versions: 0.7.0
>            Reporter: John Sichi
>            Assignee: Russell Melick
>         Attachments: HIVE-1644.1.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch
>
>
> HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan.  The next step is to use these for selecting available indexes and generating access plans for those indexes.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira